Abstract
The lack of tools to identify causative variants from sequencing data greatly limits the promise of precision medicine. Previous studies suggest that one-third of disease-associated alleles alter splicing. We discovered that the alleles causing splicing defects cluster in disease-associated genes (for example, haploinsufficient genes). We analyzed 4,964 published disease-causing exonic mutations using a massively parallel splicing assay (MaPSy), which showed an 81% concordance rate with splicing in patient tissue. Approximately 10% of exonic mutations altered splicing, mostly by disrupting multiple stages of spliceosome assembly. We present a large-scale characterization of exonic splicing mutations using a new technology that facilitates variant classification and keeps pace with variant discovery.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
References
Baird, P.A., Anderson, T.W., Newcombe, H.B. & Lowry, R.B. Genetic disorders in children and young adults: a population study. Am. J. Hum. Genet. 42, 677–693 (1988).
Yang, Y. et al. Molecular findings among patients referred for clinical whole-exome sequencing. J. Am. Med. Assoc. 312, 1870–1879 (2014).
Bamshad, M.J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011).
Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
Xue, Y. et al. Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. Am. J. Hum. Genet. 91, 1022–1032 (2012).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Lim, K.H., Ferraris, L., Filloux, M.E., Raphael, B.J. & Fairbrother, W.G. Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes. Proc. Natl. Acad. Sci. USA 108, 11093–11098 (2011).
Stenson, P.D. et al. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 21, 577–581 (2003).
Taggart, A.J., DeSimone, A.M., Shih, J.S., Filloux, M.E. & Fairbrother, W.G. Large-scale mapping of branchpoints in human pre-mRNA transcripts in vivo. Nat. Struct. Mol. Biol. 19, 719–721 (2012).
Huang, N., Lee, I., Marcotte, E.M. & Hurles, M.E. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 6, e1001154 (2010).
Ke, S. et al. Quantitative evaluation of all hexamers as exonic splicing elements. Genome Res. 21, 1360–1374 (2011).
Fairbrother, W.G., Yeh, R.F., Sharp, P.A. & Burge, C.B. Predictive identification of exonic splicing enhancers in human genes. Science 297, 1007–1013 (2002).
Amit, M. et al. Differential GC content between exons and introns establishes distinct strategies of splice-site recognition. Cell Rep. 1, 543–556 (2012).
Mort, M. et al. MutPred Splice: machine learning–based prediction of exonic variants that disrupt splicing. Genome Biol. 15, R19 (2014).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Wang, Z. et al. Systematic identification and analysis of exonic splicing silencers. Cell 119, 831–845 (2004).
Ke, S., Zhang, X.H. & Chasin, L.A. Positive selection acting on splicing motifs reflects compensatory evolution. Genome Res. 18, 533–543 (2008).
Smith, P.J. et al. An increased specificity score matrix for the prediction of SF2/ASF-specific exonic splicing enhancers. Hum. Mol. Genet. 15, 2490–2508 (2006).
Zhang, X.H. & Chasin, L.A. Computational definition of sequence motifs governing constitutive exon splicing. Genes Dev. 18, 1241–1250 (2004).
Ray, D. et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177 (2013).
Ray, D. et al. Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat. Biotechnol. 27, 667–670 (2009).
Long, J.C. & Caceres, J.F. The SR protein family of splicing factors: master regulators of gene expression. Biochem. J. 417, 15–27 (2009).
Rahman, M.A. et al. SRSF1 and hnRNP H antagonistically regulate splicing of COLQ exon 16 in a congenital myasthenic syndrome. Sci. Rep. 5, 13208 (2015).
Shen, H., Kan, J.L., Ghigna, C., Biamonti, G. & Green, M.R. A single polypyrimidine tract binding protein (PTB) binding site mediates splicing inhibition at mouse IgM exons M1 and M2. RNA 10, 787–794 (2004).
Sterne-Weiler, T., Howard, J., Mort, M., Cooper, D.N. & Sanford, J.R. Loss of exon identity is a common mechanism of human inherited disease. Genome Res. 21, 1563–1571 (2011).
Wang, J., Xiao, S.H. & Manley, J.L. Genetic analysis of the SR protein ASF/SF2: interchangeability of RS domains and negative control of splicing. Genes Dev. 12, 2222–2233 (1998).
Lim, K.H. & Fairbrother, W.G. Spliceman—a computational web server that predicts sequence variations in pre-mRNA splicing. Bioinformatics 28, 1031–1032 (2012).
Padgett, R.A., Grabowski, P.J., Konarska, M.M., Seiler, S. & Sharp, P.A. Splicing of messenger RNA precursors. Annu. Rev. Biochem. 55, 1119–1150 (1986).
Konarska, M.M. & Sharp, P.A. Electrophoretic separation of complexes involved in the splicing of precursors to mRNAs. Cell 46, 845–855 (1986).
Das, R. & Reed, R. Resolution of the mammalian E complex and the ATP-dependent spliceosomal complexes on native agarose mini-gels. RNA 5, 1504–1508 (1999).
Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
MacArthur, D.G. et al. Guidelines for investigating causality of sequence variants in human disease. Nature 508, 469–476 (2014).
Wang, Y., Ma, M., Xiao, X. & Wang, Z. Intronic splicing enhancers, cognate splicing factors and context-dependent regulation rules. Nat. Struct. Mol. Biol. 19, 1044–1052 (2012).
Rosenberg, A.B., Patwardhan, R.P., Shendure, J. & Seelig, G. Learning the sequence determinants of alternative splicing from millions of random sequences. Cell 163, 698–711 (2015).
Yeo, G. & Burge, C.B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11, 377–394 (2004).
Gozani, O., Patton, J.G. & Reed, R. A novel set of spliceosome-associated proteins and the essential splicing factor PSF bind stably to pre-mRNA prior to catalytic step II of the splicing reaction. EMBO J. 13, 3356–3367 (1994).
Reichert, V. & Moore, M.J. Better conditions for mammalian in vitro splicing provided by acetate and glutamate as potassium counterions. Nucleic Acids Res. 28, 416–423 (2000).
Dobin, A. et al. STAR: ultrafast universal RNA–seq aligner. Bioinformatics 29, 15–21 (2013).
Kursa, M.B., Jankowski, A. & Rudnicki, W.R. Boruta—a system for feature selection. Fundam. Inform. 101, 271–285 (2010).
Fairbrother, W.G. et al. RESCUE-ESE identifies candidate exonic splicing enhancers in vertebrate exons. Nucleic Acids Res. 32, W187–W190 (2004).
Lin, C.L. et al. RNA structure replaces the need for U2AF2 in splicing. Genome Res. 26, 12–23 (2016).
Wasserman, W.W. & Sandelin, A. Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5, 276–287 (2004).
Chambers, J.M. & Hastie, T. Statistical Models in S (Wadsworth & Brooks/Cole Advanced Books & Software, 1992).
Fraley, C. & Raftery, A.E. Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002).
Pesarin, F. Multivariate Permutation Tests: With Applications in Biostatistics (J. Wiley, 2001).
Acknowledgements
We thank K. Villanueva for generating the list of SNPs used in this study and A. Leblang for compiling the variants to make the oligonucleotide library. We thank M. Jurica and M. Moore for suggestions and protocols for the in vitro spliceosome assembly assay and nuclear extract preparation. We thank A. Janssens for contacting investigators for patient samples. We thank A. Toland (Ohio State University), J. Marini (NIH/NICHD) and A. Goate (Washington University Alzheimer's Disease Research Center) for contributing patient samples for validation. R.S. was supported by a Postdoctoral Fellowship from the Center for Computational Molecular Biology (CCMB), Brown University. C.R. was supported by a Graduate Research Fellowship from the National Science Foundation (NSF). This work was supported by US National Institutes of Health (NIH) grants R01GM095612 (to W.G.F.), R01GM105681 (to W.G.F.) and R21HG007905 (to W.G.F.) and by SFARI award 342705 (to W.G.F.). Part of this research was conducted using computational resources and services at the Center for Computation and Visualization, Brown University and the Genomics Core Facility, Brown University.
Author information
Authors and Affiliations
Contributions
W.G.F. and R.S. designed the experiments. R.S. performed MaPSy experiments. R.S., J.W., P.B.-T. and J.M. performed validation experiments. K.J.C. performed alignment, counting and RBP motif analyses. R.S. performed ESM analyses, machine learning and MaPSy SELEX analyses. C.L.R. performed HGMD gene analyses. C.B. and J.Y. developed the visualization web browser. W.G.F. and R.S. wrote the paper with contributions from all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Alternative splicing events in the 5K panel.
The majority of cryptic splicing occurred by creation of an AG or GT (Type I). While some other mutations increased the usage of a nearby weaker splice-site (type II). Very few mutations were found to abolish alternative splice-site usage (type III).
Supplementary Figure 2 MaPSy performance.
(a–d) Agreement between allelic splicing ratios (log2) of three cell culture replicates of MaPSy in vivo (a–c) and two experimental replicates of MaPSy in vitro (d). (e) Stacked histogram of mutant (red) and wild-type (blue) relative splicing efficiency in MaPSy in vivo (top) and in vitro (bottom). (f) Full gel of output (spliced species) from MaPSy in vivo.
Supplementary Figure 3 MaPSy validation in patient samples and ENCODE data.
(a–f) MaPSy’s identified ESMs in mutations causing inosine triphosphatase deficiency (a), galactosemia (b), haemorrhagic telangiectasia (c), Menkes syndrome (d) and Barth syndrome (e,f) were shown to exhibit splicing aberrations (exon skipping and/or intron retention) in RNAs derived from patient tissue samples. (g) Splicing efficiency in MaPSy corresponds to splicing in ENCODE data.
Supplementary Figure 4 Mode of inheritance in the 5K panel.
(a) Percent ESM in the 5K panel stratified by modes of inheritance in haploinsufficient genes (prediction score = 1), haplosufficient genes (prediction score < 0.7) and moderately haploinsufficient genes (1 > prediction score ≥ 0.7)8. Error bars, 95% confidence intervals. (b) Number of mutations in the different modes of inheritance in the 5K panel.
Supplementary Figure 5 Genes intolerant to protein-truncating variants (PTVs) in the ExAC population are predisposed to disease-associated splicing mutations.
(a) Mean fraction of ESMs in PTV-intolerant (pLI ≥ 0.9), semitolerant (0.1 < pLI < 0.9) and tolerant (pLI ≤ 0.1) genes in dominant and recessive traits. Error bars, s.e.m. (b) PTV-intolerant genes also have more introns than other genes, similar to disease genes that lose function via splicing mutations.
Supplementary Figure 6 Features of splicing.
(a) The mean of relative splicing efficiency of wild-type species in vivo (n = 2,086) is plotted against increasing mean of feature measures in sliding window (size = 200, step = 1). Shaded regions represent 95% confidence intervals. Intron length is plotted on a log10 scale. The mean of PhastCons score for all bases of the exon was used to measure conservation. Genomic features that have previously been associated with splicing are shown to display similar trends in MaPSy. P values were obtained from linear regression analyses. (b) The 5K panel is divided into five bins of increasing feature measures, and percent ESM in each bin is plotted. Error bars, 95% confidence intervals. Low differential GC content between exon and intron, less ESE, more ESS and less agreement with splice-site consensus sequence, which are all associated with weaker splicing are shown to sensitize exons to ESM. The Kruskal–Wallis test was used to obtain P values.
Supplementary Figure 7 The role of PTBP1 and SRSF1 in ESM phenotypes.
(a) The splicing phenotype of a mutation in exon 20 of COL1A2 that creates a PTBP1-binding motif was partially rescued when PTBP1 was knocked down. (b) A mutation that weaken a SRSF1-binding motif in exon 8 of MLH1 caused a modest but not significant increase of skipping events in the absence of SRSF1, whereas the wild-type exon that contains a SRSF1-binding site had a significant increase in skipping events when SRSF1 was knocked down.
Supplementary Figure 8 Overlap of intronic and exonic splicing regulatory motifs.
(a) The density for each RBP motif was calculated in all wild-type species (n = 2,048). (b) Clustering of intronic data reveals similar trends in vivo and in vitro. (c) Intronic splicing activators and exonic splicing repressors show a high degree of overlap. (d) Intronic splicing repressor motifs and exonic splicing activator motifs display a high degree of overlap. (e) Table of exonic splicing repressors and exonic splicing activators that exhibit the same function in vivo and in vitro.
Supplementary Figure 9 In vitro functional SELEX.
(a) Series of functional SELEX with MaPSy. (b) Mutant/wild type ratio in the B/C fraction in comparison to spliced species (left) and in the A fraction in comparison to spliced species (right). Enrichment in B/C complex is positively correlated with splicing, while enrichment in A complex is negatively correlated with splicing. (c) Clustering the effects of exonic mutation disruptions on different stages of spliceosomal assembly revealed mechanistic signatures of ESM. Only clusters with ≥8 members are shown.
Supplementary Figure 10 Mutant feature analyses in different clusters revealed distinct ESM mechanistic signatures.
Horizontal dotted lines indicate the mean value of the features in the 5K panel. Box plots of feature values that are significantly different than background (permuted cluster assignment) are colored red. The medians are indicated as horizontal bold lines, and the means as black hollow dots.
Supplementary Figure 11 ESM visualization browser.
A web browser was developed to visualize raw counts and information on individual mutations from original publications. Mutations can be queried by HGMD ID, gene or author.
Supplementary Figure 12 Common sequences of the 5K panel reporters.
(a) In vivo reporter sequence: CMV enhancer and promoter sequence (blue), adenovirus sequence (green; exon in uppercase and intron in lowercase), 200-mer library (red), ACTN1 intron 15 (lowercase, purple) and exon 16 (uppercase, purple), bGH poly(A) (cyan). (b) In vitro reporter sequence: adenovirus sequence (green; including T7 promoter sequence in bold), 200-mer oligo library (red) and additional intronic sequence (purple).
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–12 and Supplementary Tables 2 and 4. (PDF 2725 kb)
Supplementary Table 1
SNPs evaluated with MaPSy. (XLSX 46 kb)
Supplementary Table 3
Genes that are enriched with SSM. (XLSX 31 kb)
Rights and permissions
About this article
Cite this article
Soemedi, R., Cygan, K., Rhine, C. et al. Pathogenic variants that alter protein code often disrupt splicing. Nat Genet 49, 848–855 (2017). https://doi.org/10.1038/ng.3837
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3837
This article is cited by
-
Selection on synonymous sites: the unwanted transcript hypothesis
Nature Reviews Genetics (2024)
-
Benchmarking splice variant prediction algorithms using massively parallel splicing assays
Genome Biology (2023)
-
Splicing complexity as a pivotal feature of alternative exons in mammalian species
BMC Genomics (2023)
-
Regulation of pre-mRNA splicing: roles in physiology and disease, and therapeutic prospects
Nature Reviews Genetics (2023)
-
Transcription Factors and Splice Factors—Interconnected Regulators of Stem Cell Differentiation
Current Stem Cell Reports (2023)