Introduction

Genetic neuromuscular diseases are clinically and genetically heterogeneous genetic disorders that primarily affect the peripheral nerves, muscles, and neuromuscular junctions. Clinical symptoms include muscle weakness, muscle atrophy, joint contractures, cardiomyopathy, exercise intolerance, sensory deficits, fatigue, myalgia, tremors, ataxia, and dysarthria. These symptoms exhibit heterogeneity according to the subtypes and causative genes. The application of high-throughput analysis has revolutionized the diagnosis of genetic neuromuscular diseases, enabling a more accessible and cost-effective approach1. To date, approximately 600 causative genes have been identified2. The incidence and prevalence of genetic neuromuscular diseases have been explored in many studies, and it has been observed that variations in methodologies and ethnicities contribute to substantial discrepancies in reported values3,4,5. To achieve a comprehensive understanding of the epidemiological landscape of genetic neuromuscular diseases on a global scale, a large-scale study using uniform analytic methods is urgently required.

Since the completion of the Human Genome Project6 several databases of human exomes and genomes have been created. Notably, the Genome Aggregation Database (gnomAD) is the most representable resource7. This database serves as a comprehensive repository for information from diverse exomes and genomes, offering valuable insights into human genetic variation. These data suggest that a certain percentage of individuals in the general population carry pathogenic or likely pathogenic variants (PLPVs) of a specific gene. It could provide crucial genomic data for predicting the carrier frequency and genomic prevalence of autosomal recessive Mendelian disorders including inherited retinal diseases, Pompe disease, congenital hypothyroidism, and Upshaw-Schulman syndrome8,9,10,11. This genetic information provides simple epidemiological information and plays a crucial role in various fields, including genetic counseling and drug development.

Therefore, our study aimed to identify PLPVs using genetic data sourced from the gnomAD database. Additionally, our objectives were to the calculate the carrier frequency and predict genetic prevalence of autosomal recessive neuromuscular diseases (AR-NMDs) across major ethnicities worldwide.

Results

Identification of PLPVs of AR-NMD genes

Figure 1 and Supplementary Table 2 illustrate the analytical scheme used to evaluate the AR-NMD gene variants. We identified a comprehensive set of 326,632 AR-NMD variants derived from the gnomAD database. We excluded two variants that were present in only one homozygous individual and were not found in heterozygotes, suggesting the potential unreliability of the reads. We also excluded two variants: c.5C>G in SMN1 and c.359-1G>T in TTPA). Despite being considered PLPVs, the total allele counts at the genomic positions of the two variants were < 1000 (86 and 526, respectively). Subsequently, 326,628 variants were categorized into two main subgroups: 25,917 truncating variants and 300,711 non-truncating variants. Among the truncating variants, most (25,659) exhibited an allele frequency < 0.005. This analysis identified 9658 truncating PLPVs (2,623 literature verified variants and 7035 manually verified variants). Among the 300,711 non-truncating variants, 45,047 variants had references in scientific literature, whereas 255,664 lacked references. This analysis identified 1229 non-truncating PLPVs (1225 literature verified variants and 4 manually verified variants). In summary, a total of 10,887 PLPVs were identified, including 3,848 (35.3%) literature verified and 7039 (64.7%) manually verified variants (Supplementary Tables 2 and 3). Among the manually verified variants, 7035 were truncating and classified as PLPVs based on two pieces of evidence: (1) a null variant in a gene where the loss of function is a known mechanism of the disease, and (2) absent or an extremely low frequency in controls in the gnomAD database. The remaining four manually verified variants were missense variants and were classified as PLPVs based on three pieces of evidence: (1) the same amino acid change as a previously established pathogenic variant, (2) absent or extremely low frequency in controls in the gnomAD database, and (3) multiple lines of computational evidence supporting a deleterious effect on the gene or gene product. There were 22 homozygous PLPVs of AR-NMDs in the gnomAD database (Supplementary Table 4).

Figure 1
figure 1

Flowchart depicting the analytical scheme for the variants sourced from the gnomAD database.

The allele frequency ratio of the literature verified PLPVs to total PLPVs is 60.4% in the global population. This allele frequency ratio was the highest in the ASJ (83.3%) population, followed by the NFE (65.0%), FIN (64.7%), AMR (59.4%), SAS (52.7%), EAS (51.4%), and AFR (50.3%) populations (Supplementary Table 5).

Carrier frequency and predicted genetic prevalence of AR-NMDs

Figure 2 shows the carrier frequency and predicted genetic prevalence of AR-NMDs. The carriers of AR-NMDs are predicted to comprise 32.9% of the global population. Among subpopulations, the NFE population showed the highest carrier frequency at 36.2%, followed by AFR, EAS, AMR, ASJ, SAS, and FIN at 34.0%, 33.4%, 29.7%, 25.6%, 24.1%, and 22.4%, respectively. The predicted genetic prevalence of AR-NMD is estimated to be 24.3 cases per 100,000 individuals worldwide. The ASJ population had the highest predicted genetic prevalence of 41.4 cases per 100,000, followed by EAS, FIN, NFE, AFR, SAS, and AMR at 36.9, 35.2, 33.2, 30.3, 29.8, and 26.5 per 100,000 individuals, respectively.

Figure 2
figure 2

Global carrier frequency and predicted genetic prevalence of autosomal recessive neuromuscular diseases represented per subpopulation worldwide. The gray bars measured along the left vertical axis indicate the carrier frequency. The blue bars measured along the right vertical axis indicate the predicted genetic prevalence.

Common AR-NMD genes according to subpopulation

Table 1 and Supplementary Table 6 show the carrier frequency and predicted genetic prevalence for each AR-NMD gene in the different subpopulations. The GAA (1.3%) was the only gene with a carrier frequency exceeding 1% in the global population. However, the number of genes with a carrier frequency exceeding 1% varied among subpopulations. ASJ had the highest count, with four genes exceeding this threshold (FKTN, 1.7%; GBE1, 1.4%; GAA, 1.3%; and PFKM, 1.0%), followed by EAS with three (GAA, 1.6%; SLC22A5, 1.4%; and NEB, 1.1%), FIN with three (GLE1, 2.6%; ANO5, 1.4%; and UBA5, 1.2%), NFE with two (GAA, 1.7% and ANO5, 1.1%), AFR with two (PGAM2, 1.4% and GAA, 1.3%), AMR with one (ANO5, 1.4%), and SAS with one (GNE, 2.7%).

Table 1 Top 30 causative genes linked to autosomal recessive neuromuscular diseases.

Allele frequencies of PLPVs of AR-NMD genes

Table 2 and Supplementary Table 3 show the allele frequency of each PLPV associated with AR-NMDs in the major subpopulations. The variant with the highest allele frequency was c.-32-13T>G in GAA with 0.0033 in the global population. Within the subpopulations, the PLPV with the highest allele frequency was c.2179G>A in GNE, reaching 0.0133 in SAS, followed by c.433-10A>G in GLE1 at 0.0118 in FIN; c.1167dupA in FKTN at 0.0078 in ASJ; c.233G>A in PGAM2 at 0.0068 in AFR; c.-32-13T>G in GAA at 0.0053 in NFE; c.1385-42G>C in AGRN at 0.0028 in EAS; and c.-32-13T>G in GAA at 0.0027 in AMR.

Table 2 Allele frequency of top 30 pathogenic or likely pathogenic variants linked to autosomal recessive neuromuscular disease genes.

gnomAD individuals with homozygous PLPVs in AR-NMD genes

To determine whether gnomAD included individuals with AR-NMDs, we compared the number of gnomAD individuals with homozygous PLPVs and the expected number of individuals with homozygous PLPVs calculated using the predicted genomic prevalence. We identified 28 gnomAD individuals with homozygous PLPVs in the AR-NMD genes (Supplementary Table 4). However, the expected number of individuals with homozygous variants using the predicted genomic prevalence was 5.8 (Supplementary Table 6).

Discussion

We conducted the first systematic analysis to estimate PLPVs using a global genomic database. One-third of the PLPVs were previously reported in databases and literature, and the remaining two-thirds were manually classified by assessing their pathogenicity according to the 2015 ACMG guidelines. Therefore, we believe that we accurately analyzed the pathogenicity of all variants and the selected PLPVs.

Our study revealed that the allele frequency ratio for the literature verified PLPVs compared to the total PLPVs is high in European populations, including ASJ, NFE, and FIN, compared with the non-European population including AMR, SAS, EAS, and AFR. This finding is consistent with previous observations, indicating the limited knowledge of genetic diversity outside European populations12,13.

Our study revealed that the predicted genetic prevalence of AR-NMDs was 24.3 per 100,000 individuals. This result is lower than the prevalence of genetic neuromuscular diseases ranging from 28.6 to 82.8 per 100,000 individuals in previous studies3,4. However, considering that these studies included other neuromuscular diseases3,4, our results showed that the prevalence of AR-NMDs was significantly higher than expected. This conclusion is based on the following evidence. First, the major common causative genes of genetic neuromuscular diseases are not inherited in an autosomal recessive manner. X-linked dystrophinopathy, autosomal dominant myotonic dystrophy, and autosomal dominant facioscapulohumeral muscular dystrophy account for more than 27% of all patients with genetic myopathy5. Additionally, autosomal dominant or X-linked hereditary motor and sensory neuropathy account for 96–98% of patients with genetically-confirmed cases14,15. Second, our analysis did not include both deletions or duplications of one or more exons. For example, homozygous absence of exon 7 in SMN1 is found in approximately 95% of patients with spinal muscular atrophy, the most common disease of genetic motor neuron disease16. Third, the diagnostic rates of exome sequencing and genome sequencing are only 30–40% in patients with Mendelian diseases including genetic neuromuscular diseases1. Conversely, two-thirds of patients with genetic neuromuscular diseases cannot be identified using exome and genome sequences sourced from the gnomAD database.

Our study revealed that carrier frequency is high in the NFE and AFR populations, but the predicted genetic prevalence is high in the ASJ and EAS populations. This is because the number of genes with carrier frequencies exceeding 1% in the ASJ and EAS populations is four and three, respectively, which is higher than that in other populations.

Our results revealed that the most common causative gene of AR-NMD is GAA in the global population. One previous study showed that SMN1 and GAA is the most common causative genes of AR-NMDs in 108 autosomal recessive Mendelian diseases17. However, we could not analyze large exonal deletions, the main alterations in SMN1 in this study. This is likely why GAA emerged as the most common AR-NMD gene. This previous study showed that the carrier frequency of individuals with PLPVs in GAA was 0.8%, which is slightly lower than our results17. This study also indicated that the carrier frequency of PLPVs in GAA was much lower in the EAS population (0.3%) than in the European population (0.9%)17. This contrasts our results, which found similar frequencies in both populations. This was because the [c.752C>T; c.761C>T] variant, the most common PLPV in the EAS population, was not found or was classified as a variant of uncertain significance in this study. This variant is currently considered the major PLPV in GAA, especially in the EAS population, but has been frequently classified as a variant of uncertain significance18. Our predicted genetic prevalence was somewhat consistent with the results from newborn screening programs, which reported a prevalence ranging from 3.6 to 11.5 per 100,000 individuals11,19,20,21,22. Additionally, one previous study used the same analytical method to assess the carrier frequency and genetic prevalence of individuals with PLPVs in GAA11. The results reported a carrier frequency of 1.3% and genetic prevalence of 4.3 per 100,000 persons in GAA, which is nearly identical to our results11. The most common PLPVs in GAA vary by ethnicity, with the c.-32-13T>G and [c.752C>T; c.761C>T] variants being most common in the NFE and EAS population, respectively. These findings are consistent with previous study11.

Our findings on the common causative genes in the EAS, NFE, ASJ, FIN, and SAS populations were consistent with previous results. In the NFE population, the most common causative gene is GAA and the most common PLPV is c.-32-13T>G in GAA, which is supported by several studies11,17. However, our predicted genetic prevalence was higher than previous prevalence data (1.7–2.5 per 100,000) for the United States and Dutch populations23,24. In the ASJ population, FKTN is well-known to be associated with the common founder variant, c.1167dupA25. The carrier frequency of this variant was 0.0160, which is consistent with our results (0.0156)26. In the EAS population, GAA was the most common AR-NMD gene, which is consistent with previous results in the Chinese population27. One study showed the carrier frequency of GAA ranged from 0.01 to 0.00528. Another recent study showed that the carrier frequency of individuals with PLPVs in GAA in Chinese population was 0.0145, which is similar to our result (0.0158) in the EAS population27. Lethal congenital contracture syndrome and lethal arthrogryposis with anterior horn cell disease associated with GLE1 were first reported in FIN population29. In particular, c.433-10A>G in GLE1, called FinMajor, is a representative pathogenic variant in GLE1 that is commonly observed in the Finnish population29,30. Additionally, the prevalence of individuals with PLPVs in GAA are relatively low in the FIN population compared with the NFE population31. In the Indian population, a representative group of the SAS population, large-scale genetic analysis showed that the most common causative gene of genetic myopathy was GNE, which is consistent with our result for the SAS population32. Additionally, c.2179 G>A in GNE is a founder and major PLPV in the Indian population, which is consistent with results from previous studies32,33.

Our findings on the common causative genes in the AFR and AMR populations differ from those of previous studies. However, there have been few large-scale genetic analyses of these populations compared to those in the European population. Our results showed that PGAM2 is the most common gene in the AFR population. However, no study has investigated the prevalence of individuals with alterations in PGAM2. Muscle phosphoglycerate mutase deficiency caused by alterations in PGAM2 are frequently found in African-Americans34. In particular, c.233G>A in PGAM2 is a founder variant in the African-American population, which is consistent with our result34. One epidemiological study in the Moroccan population showed that the carrier frequency of the c.525del pathogenic variant in SGCG was 4%, which contrasts our result (1%) in the AFR population35. However, because the Moroccan population is a small subset of the AFR population, the results of two studies cannot be directly compared. In the AMR population, our study showed that ANO5 was the most common causative gene, and the c.692G>T variant in ANO5 was the second most common PLPV. This variant in ANO5 is a common PLPV in the European population but is different from the c.191dupA and c.2272C>T variants, which are the most common variants in the Northern European and FIN populations, respectively36,37. Several molecular genetic studies have shown that DYSF and CAPN3 are more prevalent as causative genes than ANO5 in patients with limb-girdle muscular weakness in the Latino, Chilean, or Argentine populations38,39,40. Analysis of common causative genes of AR-NMD in the AFR and AMR populations requires additional large-scale studies.

The number of individuals from the gnomAD database with PLPVs was approximately five times higher than the expected number of individuals with homozygous PLPVs. This finding suggests that the gnomAD database includes patients with AR-NMD and other possible genetic diseases, which is consistent with the results of a previous study8. Additionally, this discrepancy might lead to an overestimation of the allele frequency of PLPVs, despite the small number of homozygous gnomAD individuals (28 of 141,456).

Our study had several limitations. First, we identified carriers and individuals by focusing on genes and PLPVs known to cause AR-NMDs. Therefore, our study could not analyze patients with alterations in unidentified causative genes. Furthermore, we posit that many disease-causing variants, particularly non-truncating variants that have not been documented in literature, are classified as variants of unknown significance. Second, we could not analyze large deletions or duplications of exons because of the nature of the gnomAD database. Third, we calculated the predicted genetic prevalence based on the Hardy–Weinberg equation. Therefore, the actual prevalence of AR-NMDs may be higher than the values obtained in the African and South Asian populations, which have high levels of consanguinity and intracommunity marriages41. Forth, this study analyzed gnomAD data, next-generation sequencing data that has not been verified through Sanger sequencing.

In summary, our study offers a comprehensive analysis of the carrier frequency and predicted genetic prevalence of AR-NMDs in the global population and six major subpopulations. These results provide crucial insights for the epidemiological analysis, genetic counseling, newborn screening, diagnostic approaches, and therapeutic development of AR-NMDs. We found that the carrier frequency of AR-NMDs was higher than expected, constituting approximately one-third of the entire human population. Furthermore, our findings highlight the heterogeneity of genetic susceptibility to AR-NMDs based on ethnicity.

Materials and methods

Selection of AR-NMD genes

Based on the gene table of neuromuscular diseases available at (https://www.musclegenetable.fr), we identified 584 genes associated with neuromuscular diseases, including muscular dystrophies, congenital muscular dystrophies, congenital myopathies, distal myopathies, other myopathies, myotonic syndromes, ion channel muscle diseases, metabolic myopathies, hereditary cardiomyopathies, congenital myasthenic syndromes, motor neuron diseases, hereditary ataxia, hereditary motor and sensory neuropathies, and hereditary paraplegias. We then selected genes linked to AR-NMDs. Among these, LAMA5 and LAMB2 were excluded for the following reasons: LAMA5 is associated with various multisystem syndromes including neuromuscular diseases, epilepsy, and nephropathy; LAMB2 is associated with congenital myasthenic and nephrotic syndromes. However, although CAPN3 is associated with autosomal dominant and recessive inheritance patterns, it was included because it is a common causative agent of AR-NMDs. A total of 268 genes linked to AR-NMDs were selected (Supplementary Table 1).

Analysis of pathogenicity of variants from the gnomAD database

We analyzed genetic variants derived from a dataset of 125,748 exome sequences and 15,708 genome sequences, sourced from the gnomAD database (version 2.1.1; https://gnomad.broadinstitute.org). This dataset comprised individuals from diverse populations: 12,487, 17,720, 5185, 9977, 12,562, 64,603, and 15,308 from the African/African-American (AFR), Latino/Admixed American (AMR), Ashkenazi Jewish (ASJ), East Asian (EAS), Finnish (FIN), non-Finnish European (NFE), South Asian (SAS) populations, as well as 3614 from individuals categorized as “other.”

We organized all the variants associated with AR-NMD genes, sourced from the gnomAD database, into distinct categories (Fig. 1). This classification involved two main subgroups: truncating variants, such as frameshift, splice-site, nonsense, and start-loss variants, and non-truncating variants, including missense, intron, in-frame-deletion/duplication, 5ʹUTR, and 3ʹUTR variants. For further analysis, we initially separated the truncating variants based on their allele frequencies, using a cutoff threshold of 0.005, and divided the non-truncating variants into two subgroups, those with and without references in the scientific literature. Subsequently, we stratified these non-truncating variants using an allele frequency threshold of 0.005.

For literature verified variants, we compiled relevant data from scientific literatures and pertinent databases, such as ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/), HGMD (http://www.hgmd.cf.ac.uk/ac/), and LOVD (https://databases.lovd.nl/shared/genes). Variants lacking representation in scientific literature underwent a comprehensive manual assessment to ascertain their pathogenicity. The process of identifying PLPVs was performed in accordance with the 2015 guidelines of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology42. If the total allele counts at the genomic position of the variants were < 1000, they are not classified as PLPVs. This is because variants with very low allele numbers may have a large effect on overall allele frequency.

Analysis of allele frequency, carrier frequency, and predicted genetic prevalence

The gnomAD database provides specific values for each subpopulation, including allele count, allele number, and homozygote count. Using these values, we calculated the allele and carrier frequencies for a single variant. These calculations were based exclusively on heterozygous PLPVs, as previously described8:

$$\begin{aligned} & {\text{Allele frequency}}\;{\text{for a single variant}} = \left( {{\text{allele count}} - 2 \times {\text{homozygous count}}} \right){\text{/allele number}} \\ & {\text{The carrier frequency for a single variant}} = 2 \times \left( {{\text{allele count}} - 2 \times {\text{homozygous count}}} \right){\text{/allele}} \\ & {\text{number}} = 2 \times {\text{allele frequency}} \\ \end{aligned}$$

Subsequently, we calculated the carrier frequency and predicted the genetic prevalence at the genetic level, as previously described8:

$$\begin{aligned} & {\text{Carrier frequency at the gene level}} = 1 - \prod\limits_{{{\text{i}} = 1}}^{{\text{n}}} \left( {1 - {\text{carrier frequency for a single variant}}} \right) \\ & {\text{Predicted genetic prevalence at the gene level}} = \sum\limits_{{{\text{k}} = 1}}^{{\text{n}}} \left( {\text{carrier frequency for a single variant}} \right)_{{{\text{ik}}}} \\ \quad \times ({\text{carrier frequency of a single variant}})_{{{\text{ik}}}} \\ \end{aligned}$$

Finally, we determined whether gnomAD included individuals who might have been affected by AR-NMDs but were still asymptomatic, or whose condition was not immediately recognized. To do this, we compared the number of gnomAD individuals with homozygous PLPVs and the expected number of individuals with homozygous variants, calculated using a previously described method8.

$$\begin{aligned} & {\text{Expected number of patients with homozygous PLPVs}} = \left\{ \sum\limits_{{{\text{k}} = 1}}^{{\text{n}}} {\left( {\text{carrier frequency for a single variant}} \right)_{{\text{k}}}^{{2}} } \right\} \\ & \quad \times {\text{total individuals}}({141},{456}){\text{ in thegnomAD database version 2}}.{1}.{1}. \\ \end{aligned}$$

Ethnical consideration

This study was approved by the Institutional Review Board of the Gangnam Severance Hospital, Korea (approval number: 3-2023-0065). The requirement for written informed consent was waived by the board because of all data were provided by the gnomAD with all personal information anonymously encrypted according to a strict confidentiality protocol.