Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification

Abstract

CRISPR base editing screens enable analysis of disease-associated variants at scale; however, variable efficiency and precision confounds the assessment of variant-induced phenotypes. Here, we provide an integrated experimental and computational pipeline that improves estimation of variant effects in base editing screens. We use a reporter construct to measure guide RNA (gRNA) editing outcomes alongside their phenotypic consequences and introduce base editor screen analysis with activity normalization (BEAN), a Bayesian network that uses per-guide editing outcomes provided by the reporter and target site chromatin accessibility to estimate variant impacts. BEAN outperforms existing tools in variant effect quantification. We use BEAN to pinpoint common regulatory variants that alter low-density lipoprotein (LDL) uptake, implicating previously unreported genes. Additionally, through saturation base editing of LDLR, we accurately quantify missense variant pathogenicity that is consistent with measurements in UK Biobank patients and identify underlying structural mechanisms. This work provides a widely applicable approach to improve the power of base editing screens for disease-associated variant characterization.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Activity-normalized base editing screening pipeline.
Fig. 2: BEAN models variant effects from activity-normalized base editing screens.
Fig. 3: BEAN improves variant impact estimation from the LDL-C GWAS library screen.
Fig. 4: Functional characterization of LDL-C GWAS variants.
Fig. 5: Dissection of LDLR variant effects through BEAN modeling of a saturation tiled base editing screen.
Fig. 6: Deleterious variants in LDLR class B repeats weaken hydrophobic interactions.

Similar content being viewed by others

Data availability

The processed data used in this study have been deposited at Zenodo (https://doi.org/10.5281/zenodo.10139794), and primary sequencing data are available at the Sequence Read Archive under accession PRJNA1042659. Controlled access, patient-level data from the UKB may be requested at https://ams.ukbiobank.ac.uk/ams/. Source data are provided with this paper.

Code availability

Bean source code is available at https://github.com/pinellolab/crispr-bean. The scripts used to generate the figures and analyses presented in the study have been deposited at https://github.com/pinellolab/bean_manuscript and Zenodo110. The version (0.2.9) of ‘bean’ used for the analyses presented in this paper has been deposited at Zenodo74.

References

  1. Tam, V. et al. Benefits and limitations of genome-wide association studies. Nat. Rev. Genet. 20, 467–484 (2019).

    Article  CAS  PubMed  Google Scholar 

  2. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Gasperini, M., Starita, L. & Shendure, J. The power of multiplexed functional analysis of genetic variants. Nat. Protoc. 11, 1782–1787 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Araya, C. L. & Fowler, D. M. Deep mutational scanning: assessing protein function on a massive scale. Trends Biotechnol. 29, 435–442 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Myers, R. M., Tilly, K. & Maniatis, T. Fine structure genetic analysis of a β-globin promoter. Science 232, 613–618 (1986).

    Article  CAS  PubMed  Google Scholar 

  6. Inoue, F. & Ahituv, N. Decoding enhancers using massively parallel reporter assays. Genomics 106, 159–164 (2015).

    Article  CAS  PubMed  Google Scholar 

  7. Bock, C. et al. High-content CRISPR screening. Nat. Rev. Methods Prim. 2, 9 (2022).

    Article  Google Scholar 

  8. Shalem, O. et al. Genome-scale CRISPR–Cas9 knockout screening in human cells. Science 343, 84–87 (2014).

    Article  CAS  PubMed  Google Scholar 

  9. Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR–Cas9 system. Science 343, 80–84 (2014).

    Article  CAS  PubMed  Google Scholar 

  10. Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet. 19, 770–788 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Hanna, R. E. et al. Massively parallel assessment of human variants with base editor screens. Cell 184, 1064–1080 (2021).

    Article  CAS  PubMed  Google Scholar 

  14. Morris, J. A. et al. Discovery of target genes and pathways at GWAS loci by pooled single-cell CRISPR screens. Science 380, eadh7699 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Martin-Rufino, J. D. et al. Massively parallel base editing to map variant effects in human hematopoiesis. Cell 186, 2456–2474 (2023).

    Article  CAS  PubMed  Google Scholar 

  16. Cuella-Martin, R. et al. Functional interrogation of DNA damage response variants with base editing screens. Cell 184, 1081–1097 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Pablo, J. L. B. et al. Scanning mutagenesis of the voltage-gated sodium channel NaV1.2 using base editing. Cell Rep. 42, 112563 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Coelho, M. A. et al. Base editing screens map mutations affecting interferon-γ signaling in cancer. Cancer Cell 41, 288–303 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Cheng, L. et al. Single-nucleotide-level mapping of DNA regulatory elements that control fetal hemoglobin expression. Nat. Genet. 53, 869–880 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Sánchez-Rivera, F. J. et al. Base editing sensor libraries for high-throughput engineering and functional analysis of cancer-associated single nucleotide variants. Nat. Biotechnol. 40, 862–873 (2022).

  21. Kim, Y. et al. High-throughput functional evaluation of human cancer-associated mutations using base editors. Nat. Biotechnol. 40, 874–884 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Kweon, J. et al. A CRISPR-based base-editing screen for the functional assessment of BRCA1 variants. Oncogene 39, 30–35 (2020).

    Article  CAS  PubMed  Google Scholar 

  23. Huang, C., Li, G., Wu, J., Liang, J. & Wang, X. Identification of pathogenic variants in cancer genes using base editing screens with editing efficiency correction. Genome Biol. 22, 80 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Sangree, A. K. et al. Benchmarking of SpCas9 variants enables deeper base editor screens of BRCA1 and BCL2. Nat. Commun. 13, 1318 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Lue, N. Z. et al. Base editor scanning charts the DNMT3A activity landscape. Nat. Chem. Biol. 19, 176–186 (2023).

    Article  CAS  PubMed  Google Scholar 

  26. Després, P. C., Dubé, A. K., Seki, M., Yachie, N. & Landry, C. R. Perturbing proteomes at single residue resolution using base editing. Nat. Commun. 11, 1871 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Garcia, E. M. et al. Base editor scanning reveals activating mutations of DNMT3A. ACS Chem. Biol. 18, 2030–2038 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Lue, N. Z. & Liau, B. B. Base editor screens for in situ mutational scanning at scale. Mol. Cell 83, 2167–2187 (2023).

    Article  CAS  PubMed  Google Scholar 

  29. Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463–480 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Graham, S. E. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Bouhairie, V. E. & Goldberg, A. C. Familial hypercholesterolemia. Cardiol. Clin. 33, 169–179 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Brown, M. S. & Goldstein, J. L. How LDL receptors influence cholesterol and atherosclerosis. Sci. Am. 251, 58–66 (1984).

    Article  CAS  PubMed  Google Scholar 

  33. Mundal, L. J. et al. Impact of age on excess risk of coronary heart disease in patients with familial hypercholesterolaemia. Heart 104, 1600–1607 (2018).

    Article  PubMed  Google Scholar 

  34. Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).

    Article  CAS  PubMed  Google Scholar 

  35. Hamilton, M. C. et al. Systematic elucidation of genetic mechanisms underlying cholesterol uptake. Cell Genom. 3, 100304 (2023).

    CAS  Google Scholar 

  36. Spady, D. K. Hepatic clearance of plasma low density lipoproteins. Semin. Liver Dis. 12, 373–385 (1992).

    CAS  Google Scholar 

  37. Richter, M. F. et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat. Biotechnol. 38, 883–891 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Walton, R. T., Christie, K. A., Whittaker, M. N. & Kleinstiver, B. P. Unconstrained genome targeting with near-PAMless engineered CRISPR–Cas9 variants. Science 368, 290–296 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Park, H., Shin, J., Choi, H., Cho, B. & Kim, J. Valproic acid significantly improves CRISPR/Cas9-mediated gene editing. Cells 9, 1447 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Shin, H. R. et al. Small-molecule inhibitors of histone deacetylase improve CRISPR-based adenine base editing. Nucleic Acids Res. 49, 2390–2399 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Yang, C. et al. HMGN1 enhances CRISPR-directed dual-function A-to-G and C-to-G base editing. Nat. Commun. 14, 2430 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Arbab, M. et al. Base editing rescue of spinal muscular atrophy in cells and in mice. Science 380, eadg6518 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Schep, R. et al. Impact of chromatin context on Cas9-induced DNA double-strand break repair pathway balance. Mol. Cell 81, 2216–2230 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Ding, X. et al. Improving CRISPR–Cas9 genome editing efficiency by fusion with chromatin-modulating peptides. CRISPR J. 2, 51–63 (2019).

    Article  CAS  PubMed  Google Scholar 

  45. Liu, G., Yin, K., Zhang, Q., Gao, C. & Qiu, J.-L. Modulating chromatin accessibility by transactivation and targeting proximal dsgRNAs enhances Cas9 editing efficiency in vivo. Genome Biol. 20, 145 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Maeder, M. L. et al. CRISPR RNA-guided activation of endogenous human genes. Nat. Methods 10, 977–979 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Perez-Pinera, P. et al. RNA-guided gene activation by CRISPR–Cas9-based transcription factors. Nat. Methods 10, 973–976 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Qi, L. S. et al. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152, 1173–1183 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 15, 554 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Li, W. et al. Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR. Genome Biol. 16, 281 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Jeong, H.-H., Kim, S. Y., Rousseaux, M. W. C., Zoghbi, H. Y. & Liu, Z. Beta-binomial modeling of CRISPR pooled screen data identifies target genes with greater sensitivity and fewer false negatives. Genome Res. 29, 999–1008 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Daley, T. P. et al. CRISPhieRmix: a hierarchical mixture model for CRISPR pooled screens. Genome Biol. 19, 159 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Zou, Y., Carbonetto, P., Wang, G. & Stephens, M. Fine-mapping from summary data with the ‘Sum of Single Effects’ model. PLoS Genet. 18, e1010299 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Tehranchi, A. et al. Fine-mapping cis-regulatory variants in diverse human populations. eLife 8, e39595 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Tehranchi, A. K. et al. Pooled ChIP–seq links variation in transcription factor binding to complex disease risk. Cell 165, 730–741 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Currin, K. W. et al. Genetic effects on liver chromatin accessibility identify disease regulatory variants. Am. J. Hum. Genet. 108, 1169–1189 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Kanai, M. et al. Insights from complex trait fine-mapping across diverse populations. Preprint at bioRxiv https://doi.org/10.1101/2021.09.03.21262975 (2021).

  59. GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    Article  Google Scholar 

  60. Biasella, F., Plössl, K., Karl, C., Weber, B. H. F. & Friedrich, U. Altered protein function caused by AMD-associated variant rs704 links vitronectin to disease pathology. Invest. Ophthalmol. Vis. Sci. 61, 2 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Yao, Q. et al. Motif-Raptor: a cell type-specific and transcription factor centric approach for post-GWAS prioritization of causal regulators. Bioinformatics 37, 2103–2111 (2021).

    Article  CAS  PubMed  Google Scholar 

  62. Jing, Z., Liu, Y., Dong, M., Hu, S. & Huang, S. Identification of the DNA binding element of the human ZNF333 protein. J. Biochem. Mol. Biol. 37, 663–670 (2004).

    CAS  PubMed  Google Scholar 

  63. Witzgall, R., O’Leary, E., Leaf, A., Onaldi, D. & Bonventre, J. V. The Krüppel-associated box-A (KRAB-A) domain of zinc finger proteins mediates transcriptional repression. Proc. Natl Acad. Sci. USA 91, 4514–4518 (1994).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Fass, D., Blacklow, S., Kim, P. S. & Berger, J. M. Molecular basis of familial hypercholesterolaemia from structure of LDL receptor module. Nature 388, 691–693 (1997).

    Article  CAS  PubMed  Google Scholar 

  65. Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).

    Article  CAS  PubMed  Google Scholar 

  66. Willer, C. J. et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).

  67. Yu, T., Fife, J. D., Adzhubey, I., Sherwood, R. & Cassa, C. A. Joint estimation and imputation of variant functional effects using high throughput assay data. Preprint at medRxiv https://doi.org/10.1101/2023.01.06.23284280 (2023).

  68. Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery 2016).

  69. Clarke, S. L. et al. Coronary artery disease risk of familial hypercholesterolemia genetic variants independent of clinically observed longitudinal cholesterol exposure. Circ. Genom. Precis. Med. 15, e003501 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Zhou, Y., Pan, Q., Pires, D. E. V., Rodrigues, C. H. M. & Ascher, D. B. DDMut: predicting effects of mutations on protein stability using deep learning. Nucleic Acids Res. 51, W122–W128 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Jubb, H. C. et al. Arpeggio: a web server for calculating and visualising interatomic interactions in protein structures. J. Mol. Biol. 429, 365–371 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).

    Article  CAS  PubMed  Google Scholar 

  73. Rose, G. D., Geselowitz, A. R., Lesser, G. J., Lee, R. H. & Zehfus, M. H. Hydrophobicity of amino acid residues in globular proteins. Science 229, 834–838 (1985).

    Article  CAS  PubMed  Google Scholar 

  74. Ryu, J. & Pinello, L. pinellolab/crispr-bean: v0.2.9. Zenodo https://doi.org/10.5281/zenodo.10191493 (2023).

  75. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Cassa, C. A. et al. Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat. Genet. 49, 806–810 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science. Nat. Methods 11, 801–807 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  79. Brnich, S. E. et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med. 12, 3 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  80. Domanski, M. J. et al. Time course of LDL cholesterol exposure and cardiovascular disease event risk. J. Am. Coll. Cardiol. 76, 1507–1516 (2020).

    Article  CAS  PubMed  Google Scholar 

  81. Duncan, M. S., Vasan, R. S. & Xanthakis, V. Trajectories of blood lipid concentrations over the adult life course and risk of cardiovascular disease and all-cause mortality: observations from the Framingham Study over 35 years. J. Am. Heart Assoc. 8, e011433 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Mundal, L. & Retterstøl, K. A systematic review of current studies in patients with familial hypercholesterolemia by use of national familial hypercholesterolemia registries. Curr. Opin. Lipidol. 27, 388–397 (2016).

    Article  CAS  PubMed  Google Scholar 

  83. Frazer, J. et al. Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95 (2021).

  84. Ioannidis, N. M. et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Gao, H. et al. The landscape of tolerated genetic variation in humans and primates. Science 380, eabn8153 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Klimentidis, Y. C. et al. Phenotypic and genetic characterization of lower LDL cholesterol and increased type 2 diabetes risk in the UK Biobank. Diabetes 69, 2194–2205 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Oommen, D., Kizhakkedath, P., Jawabri, A. A., Varghese, D. S. & Ali, B. R. Proteostasis regulation in the endoplasmic reticulum: an emerging theme in the molecular pathology and therapeutic management of familial hypercholesterolemia. Front. Genet. 11, 570355 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Wheeler, T. J., Clements, J. & Finn, R. D. Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models. BMC Bioinformatics 15, 7 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  89. Sanjana, N. E., Shalem, O. & Zhang, F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783–784 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Chen, B. et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell 155, 1479–1491 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Untergasser, A. et al. Primer3—new capabilities and interfaces. Nucleic Acids Res. 40, e115 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. In Proceedings of the 9th Python in Science Conference (eds Van der Walt, S. & Millman, J.) https://doi.org/10.25080/majora-92bf1922-011 (SciPy, 2010).

  95. McWilliam, H. et al. Analysis tool web services from the EMBL-EBI. Nucleic Acids Res. 41, W597–W600 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  96. Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  97. Goujon, M. et al. A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Res. 38, W695–W699 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  99. Morales, J. et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature 604, 310–315 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  101. Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  103. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).

    Article  CAS  PubMed  Google Scholar 

  105. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  106. The PyMOL Molecular Graphics System v.1.8 (Schrödinger, 2015).

  107. Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Webb, B. & Sali, A. Comparative protein structure modeling using MODELLER. Curr. Protoc. Protein Sci. 86, 2.9.1–2.9.37 (2016).

    Article  PubMed  Google Scholar 

  109. Waskom, M. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).

    Article  Google Scholar 

  110. Ryu, J. K., Tognon, M. & Li, Z. pinellolab/bean_manuscript: v1.0.2. Zenodo https://doi.org/10.5281/zenodo.10775808 (2024).

Download references

Acknowledgements

We thank G. Losyev, A. James, Q. Qin, C. Smith, L. Blaine, K. Clement, Z. Patel, S. Yang and H. Boen for technical assistance. Funding for this work was obtained from UM1HG012010 (R.I.S. and L.P.), 1R01HL164409 (C.A.C., R.I.S. and L.P.), 1R01GM143249 (R.I.S.), R01HG010372 (C.A.C. and T.Y.), the American Cancer Society (R.I.S.), the American Heart Association (R.I.S.), the National Organization for Rare Diseases (R.I.S.), 1R35HG010717-01 (L.P.), the National Health and Medical Research Council of Australia (GNT1174405; D.B.A. and Y.Z.), and the Victorian Government’s Operational Infrastructure Support Program (Y.Z. and D.B.A.). We are indebted to the UKB and its participants (UKB application 41250 and IRB protocol 2020P002093).

Author information

Authors and Affiliations

Authors

Contributions

R.I.S. conceived the experimental design, and J.R. and L.P. conceptualized BEAN. S.B. collected screen data. J.R. developed BEAN, and M.J., M.I.L. and L.P. advised on design and implementation of BEAN. J.R. and T.Y. processed and analyzed data. T.Y. performed BE-Hive and FUSE analysis. M.F., Q.V.P. and R.I.S. performed downstream characterization of LDL-C GWAS variants. T.Y., L.B., V.B. and C.A.C. obtained and analyzed UKB data. Y.Z. led structural analysis of LDLR variants with J.R. and D.B.A. J.R. and Z.L. benchmarked classification performance. M.T. and L.P. performed analysis of variant impact on transcription factor binding. G.L. advised on library design. J.R. and R.I.S. drafted the manuscript. R.I.S., L.P. and C.A.C. provided guidance and supervised this project. All the authors wrote and approved the final manuscript.

Corresponding authors

Correspondence to Christopher A. Cassa, Richard I. Sherwood or Luca Pinello.

Ethics declarations

Competing interests

L.P. has financial interests in Edilytics and SeQure Dx. L.P.’s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict-of-interest policies. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Andrew Wood and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Base editor editing preference profile and context specificity.

Deamination motif and PAM-dependent editing preference of AID-BE5-SpRY from 7294 gRNAs and AID-BE5-Cas9NG from 7299 gRNAs with more than 9 read counts across any replicates of bulk samples. a) Context specificity of AID-BE5-SpRY are represented as sequence logos. The height of each base represents the relative editing efficiency with each base. b) Mean editing efficiency of AID-BE5-SpRY by protospacer position and PAM sequence. c) Context specificity of AID-BE5-CasNG is represented as sequence logos. The height of each base represents the relative editing efficiency with each base. d) Mean editing efficiency of AID-BE5-CasNG by protospacer position and PAM sequence.

Source data

Extended Data Fig. 2 Nucleotide-level editing comparison of reporter and endogenous locus.

a) Scatterplots comparing of per-nucleotide-level editing efficiencies between the reporter and endogenous target sites. All edits introduced by each of 49 gRNAs across four loci across 3 experimental replicates were plotted. Points are colored by the identity of nucleotide edit and gRNA. b) The same plot colored by gRNA strand. R; Pearson correlation coefficient. n; number of plotted editing rates.

Source data

Extended Data Fig. 3 BEAN plate diagrams.

Plate diagrams of a) BEAN b) BEAN-Reporter, c) BEAN-Uniform. Xb and all parameters with superscript b is not used for benchmark analyses.

Extended Data Fig. 4 LDL-C GWAS library classification task benchmark.

a) AUPRC plot for classifying positive splicing control variants against negative control variants. Metrics for all 5 replicates are shown as markers and metrics of 15 two-replicate subsamples among the 5 replicates are shown as box plots. Boxplot was plotted as described in the statistical note of the Methods section. b) Precision-Recall curve for classifying all positive control splice sites of against negative controls for all replicates with no failing samples. c) Precision-Recall curve for classifying splice sites of LDLR/MYLIP against negative controls for all replicates with no failing samples. d) Precision-Recall curve for classifying all positive control splice sites of against negative controls for 2-replicate subsample of the data. Mean Precision value for a recall across 15 subsample runs are plotted as solid line. e) Precision-Recall curve for classifying splice sites of LDLR/MYLIP against negative controls for 2-replicate subsample of the data. Mean Precision value for a recall across 15 subsample runs are plotted as solid line.

Source data

Extended Data Fig. 5 Comparison of inferred effect sizes of individually transfected LDL-C GWAS library gRNAs.

Scatterplot and Pearson correlation coefficients (R) of effect size estimates and the log fold change (LFC) of fluorescence signal following individual transfection of 22 gRNAs. R; Spearman correlation coefficient.

Source data

Extended Data Fig. 6 BEAN accurately estimates variant effect confidence from per-variant evidence in input data.

a-c) Scatterplot of 2,182 LDLR tiling library variants comparing a) nnorm and effective edit rates, b) effective edit rates and BEAN σμ, and c) nnorm and BEAN σμ. d) Histogram of effective edit rates of 76 LDLR tiling library variants with UKB LDL-C levels. Quartile bin cutoffs used to categorize variants are shown as dotted lines. e) Scatterplots of BEAN z-scores and statin-adjusted UKB LDL-C measurements for variants in each effective edit rate quartile bin. r and rho shows the Pearson and Sparman correlation coefficients, respectively.

Source data

Extended Data Fig. 7 LDLR tiling library classification task benchmark.

a) AUPRC of classifying Pathogenic/Likely Pathogenic from Benign/Likely Benign variants when using 4 replicates without failing samples and 6 2-replicates combinations among the replicates. Bounds and the center of the boxes are the interquartile ranges. Boxplot was plotted as described in the statistical note of the Methods section. b-e) Precision-recall curve of classifying b, d) Pathogenic/Likely Pathogenic c, e) Pathogenic from Benign/Likely Benign variants. Top panels (b, c) show classification when used 4 replicates without failing samples. Bottom panels (d, e) show when used 6 2-replicates combinations among 4 replicates without failing samples.

Source data

Extended Data Fig. 8 Comparison of functional impact and conservation within conserved LDLR domains.

Repeat domain alignments shown with BEAN z-score for a) LDLR class A repeat domain, b) LDLR class B repeat domain, c) EGF-like domains aligned with the Pfam profile HMM logo by Skylign, where the height of each position show its information content and letter heights show the total height scaled by relative frequencies of the letters in the position. For a), conserved cysteine residue position is highlighted and for b-c), consensus positions from Clustal Omega alignment output are highlighted in grey.

Source data

Extended Data Fig. 9 Expanded LDLR missense variant pathogenicity estimates with FUSE.

a) Scatterplot of all considered UKB variant mean statin-adjusted LDL-C level against imputed BEAN-FUSE score. b) Prediction outcome of unobserved variants with XGBoost model trained on observed UKB variants and mean statin-adjusted LDL levels. c-d) Correlation coefficients and root mean squared error (RMSE) for predicted and true UKB mean statin-adjusted LDL-C level for XGBoost model with FUSE score, PhastCons PhyloP conservation score, and both as the input in predicting LDL-C levels. c) Boxplot of metrics for prediction of observed variants with 10-fold cross validation (n = 10) d) Barplot of metrics for prediction of unobserved variants with model trained on observed variants (n = 1). r, ρ; Pearson, Spearman correlation coefficient, RMSE; Root mean squared error.

Source data

Extended Data Fig. 10 Local atomic interaction in wild type and mutated structure for selected variants in LDLR class B repeat domain.

ak, Residues with interaction with the variant position are shown. Variant positions and interacting residues are colored by the reference amino acid and atomic elements (O: red, N: blue, S: yellow). Ref AA; reference amino acid.

Supplementary information

Supplementary Information

Supplementary Figs. 1–20 and Notes 1–8.

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Tables 1–11.

Supplementary Data 1

Annotated base editor plasmid sequences.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 9

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ryu, J., Barkal, S., Yu, T. et al. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. Nat Genet (2024). https://doi.org/10.1038/s41588-024-01726-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41588-024-01726-6

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics