Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification

Ryu, Jayoung; Barkal, Sam; Yu, Tian; Jankowiak, Martin; Zhou, Yunzhuo; Francoeur, Matthew; Phan, Quang Vinh; Li, Zhijian; Tognon, Manuel; Brown, Lara; Love, Michael I.; Bhat, Vineel; Lettre, Guillaume; Ascher, David B.; Cassa, Christopher A.; Sherwood, Richard I.; Pinello, Luca

doi:10.1038/s41588-024-01726-6

Article
Published: 24 April 2024

Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification

Nature Genetics (2024)Cite this article

1649 Accesses
18 Altmetric
Metrics details

Subjects

Abstract

CRISPR base editing screens enable analysis of disease-associated variants at scale; however, variable efficiency and precision confounds the assessment of variant-induced phenotypes. Here, we provide an integrated experimental and computational pipeline that improves estimation of variant effects in base editing screens. We use a reporter construct to measure guide RNA (gRNA) editing outcomes alongside their phenotypic consequences and introduce base editor screen analysis with activity normalization (BEAN), a Bayesian network that uses per-guide editing outcomes provided by the reporter and target site chromatin accessibility to estimate variant impacts. BEAN outperforms existing tools in variant effect quantification. We use BEAN to pinpoint common regulatory variants that alter low-density lipoprotein (LDL) uptake, implicating previously unreported genes. Additionally, through saturation base editing of LDLR, we accurately quantify missense variant pathogenicity that is consistent with measurements in UK Biobank patients and identify underlying structural mechanisms. This work provides a widely applicable approach to improve the power of base editing screens for disease-associated variant characterization.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Activity-normalized base editing screening pipeline.**

**Fig. 2: BEAN models variant effects from activity-normalized base editing screens.**

**Fig. 3: BEAN improves variant impact estimation from the LDL-C GWAS library screen.**

**Fig. 4: Functional characterization of LDL-C GWAS variants.**

**Fig. 5: Dissection of *LDLR* variant effects through BEAN modeling of a saturation tiled base editing screen.**

**Fig. 6: Deleterious variants in LDLR class B repeats weaken hydrophobic interactions.**

CoCas9 is a compact nuclease from the human microbiome for efficient and precise genome editing

Article Open access 24 April 2024

Genome-wide association studies

Article 26 August 2021

Single-cell analysis reveals context-dependent, cell-level selection of mtDNA

Article Open access 24 April 2024

Data availability

The processed data used in this study have been deposited at Zenodo (https://doi.org/10.5281/zenodo.10139794), and primary sequencing data are available at the Sequence Read Archive under accession PRJNA1042659. Controlled access, patient-level data from the UKB may be requested at https://ams.ukbiobank.ac.uk/ams/. Source data are provided with this paper.

Code availability

Bean source code is available at https://github.com/pinellolab/crispr-bean. The scripts used to generate the figures and analyses presented in the study have been deposited at https://github.com/pinellolab/bean_manuscript and Zenodo¹¹⁰. The version (0.2.9) of ‘bean’ used for the analyses presented in this paper has been deposited at Zenodo⁷⁴.

References

Tam, V. et al. Benefits and limitations of genome-wide association studies. Nat. Rev. Genet. 20, 467–484 (2019).
Article CAS PubMed Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article CAS PubMed PubMed Central Google Scholar
Gasperini, M., Starita, L. & Shendure, J. The power of multiplexed functional analysis of genetic variants. Nat. Protoc. 11, 1782–1787 (2016).
Article CAS PubMed PubMed Central Google Scholar
Araya, C. L. & Fowler, D. M. Deep mutational scanning: assessing protein function on a massive scale. Trends Biotechnol. 29, 435–442 (2011).
Article CAS PubMed PubMed Central Google Scholar
Myers, R. M., Tilly, K. & Maniatis, T. Fine structure genetic analysis of a β-globin promoter. Science 232, 613–618 (1986).
Article CAS PubMed Google Scholar
Inoue, F. & Ahituv, N. Decoding enhancers using massively parallel reporter assays. Genomics 106, 159–164 (2015).
Article CAS PubMed Google Scholar
Bock, C. et al. High-content CRISPR screening. Nat. Rev. Methods Prim. 2, 9 (2022).
Article Google Scholar
Shalem, O. et al. Genome-scale CRISPR–Cas9 knockout screening in human cells. Science 343, 84–87 (2014).
Article CAS PubMed Google Scholar
Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR–Cas9 system. Science 343, 80–84 (2014).
Article CAS PubMed Google Scholar
Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).
Article CAS PubMed PubMed Central Google Scholar
Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).
Article CAS PubMed PubMed Central Google Scholar
Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet. 19, 770–788 (2018).
Article CAS PubMed PubMed Central Google Scholar
Hanna, R. E. et al. Massively parallel assessment of human variants with base editor screens. Cell 184, 1064–1080 (2021).
Article CAS PubMed Google Scholar
Morris, J. A. et al. Discovery of target genes and pathways at GWAS loci by pooled single-cell CRISPR screens. Science 380, eadh7699 (2023).
Article CAS PubMed PubMed Central Google Scholar
Martin-Rufino, J. D. et al. Massively parallel base editing to map variant effects in human hematopoiesis. Cell 186, 2456–2474 (2023).
Article CAS PubMed Google Scholar
Cuella-Martin, R. et al. Functional interrogation of DNA damage response variants with base editing screens. Cell 184, 1081–1097 (2021).
Article CAS PubMed PubMed Central Google Scholar
Pablo, J. L. B. et al. Scanning mutagenesis of the voltage-gated sodium channel Na_V1.2 using base editing. Cell Rep. 42, 112563 (2023).
Article CAS PubMed PubMed Central Google Scholar
Coelho, M. A. et al. Base editing screens map mutations affecting interferon-γ signaling in cancer. Cancer Cell 41, 288–303 (2023).
Article CAS PubMed PubMed Central Google Scholar
Cheng, L. et al. Single-nucleotide-level mapping of DNA regulatory elements that control fetal hemoglobin expression. Nat. Genet. 53, 869–880 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sánchez-Rivera, F. J. et al. Base editing sensor libraries for high-throughput engineering and functional analysis of cancer-associated single nucleotide variants. Nat. Biotechnol. 40, 862–873 (2022).
Kim, Y. et al. High-throughput functional evaluation of human cancer-associated mutations using base editors. Nat. Biotechnol. 40, 874–884 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kweon, J. et al. A CRISPR-based base-editing screen for the functional assessment of BRCA1 variants. Oncogene 39, 30–35 (2020).
Article CAS PubMed Google Scholar
Huang, C., Li, G., Wu, J., Liang, J. & Wang, X. Identification of pathogenic variants in cancer genes using base editing screens with editing efficiency correction. Genome Biol. 22, 80 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sangree, A. K. et al. Benchmarking of SpCas9 variants enables deeper base editor screens of BRCA1 and BCL2. Nat. Commun. 13, 1318 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lue, N. Z. et al. Base editor scanning charts the DNMT3A activity landscape. Nat. Chem. Biol. 19, 176–186 (2023).
Article CAS PubMed Google Scholar
Després, P. C., Dubé, A. K., Seki, M., Yachie, N. & Landry, C. R. Perturbing proteomes at single residue resolution using base editing. Nat. Commun. 11, 1871 (2020).
Article PubMed PubMed Central Google Scholar
Garcia, E. M. et al. Base editor scanning reveals activating mutations of DNMT3A. ACS Chem. Biol. 18, 2030–2038 (2023).
Article CAS PubMed PubMed Central Google Scholar
Lue, N. Z. & Liau, B. B. Base editor screens for in situ mutational scanning at scale. Mol. Cell 83, 2167–2187 (2023).
Article CAS PubMed Google Scholar
Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463–480 (2020).
Article CAS PubMed PubMed Central Google Scholar
Graham, S. E. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bouhairie, V. E. & Goldberg, A. C. Familial hypercholesterolemia. Cardiol. Clin. 33, 169–179 (2015).
Article PubMed PubMed Central Google Scholar
Brown, M. S. & Goldstein, J. L. How LDL receptors influence cholesterol and atherosclerosis. Sci. Am. 251, 58–66 (1984).
Article CAS PubMed Google Scholar
Mundal, L. J. et al. Impact of age on excess risk of coronary heart disease in patients with familial hypercholesterolaemia. Heart 104, 1600–1607 (2018).
Article PubMed Google Scholar
Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).
Article CAS PubMed Google Scholar
Hamilton, M. C. et al. Systematic elucidation of genetic mechanisms underlying cholesterol uptake. Cell Genom. 3, 100304 (2023).
CAS Google Scholar
Spady, D. K. Hepatic clearance of plasma low density lipoproteins. Semin. Liver Dis. 12, 373–385 (1992).
CAS Google Scholar
Richter, M. F. et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat. Biotechnol. 38, 883–891 (2020).
Article CAS PubMed PubMed Central Google Scholar
Walton, R. T., Christie, K. A., Whittaker, M. N. & Kleinstiver, B. P. Unconstrained genome targeting with near-PAMless engineered CRISPR–Cas9 variants. Science 368, 290–296 (2020).
Article CAS PubMed PubMed Central Google Scholar
Park, H., Shin, J., Choi, H., Cho, B. & Kim, J. Valproic acid significantly improves CRISPR/Cas9-mediated gene editing. Cells 9, 1447 (2020).
Article CAS PubMed PubMed Central Google Scholar
Shin, H. R. et al. Small-molecule inhibitors of histone deacetylase improve CRISPR-based adenine base editing. Nucleic Acids Res. 49, 2390–2399 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yang, C. et al. HMGN1 enhances CRISPR-directed dual-function A-to-G and C-to-G base editing. Nat. Commun. 14, 2430 (2023).
Article CAS PubMed PubMed Central Google Scholar
Arbab, M. et al. Base editing rescue of spinal muscular atrophy in cells and in mice. Science 380, eadg6518 (2023).
Article CAS PubMed PubMed Central Google Scholar
Schep, R. et al. Impact of chromatin context on Cas9-induced DNA double-strand break repair pathway balance. Mol. Cell 81, 2216–2230 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ding, X. et al. Improving CRISPR–Cas9 genome editing efficiency by fusion with chromatin-modulating peptides. CRISPR J. 2, 51–63 (2019).
Article CAS PubMed Google Scholar
Liu, G., Yin, K., Zhang, Q., Gao, C. & Qiu, J.-L. Modulating chromatin accessibility by transactivation and targeting proximal dsgRNAs enhances Cas9 editing efficiency in vivo. Genome Biol. 20, 145 (2019).
Article PubMed PubMed Central Google Scholar
Maeder, M. L. et al. CRISPR RNA-guided activation of endogenous human genes. Nat. Methods 10, 977–979 (2013).
Article CAS PubMed PubMed Central Google Scholar
Perez-Pinera, P. et al. RNA-guided gene activation by CRISPR–Cas9-based transcription factors. Nat. Methods 10, 973–976 (2013).
Article CAS PubMed PubMed Central Google Scholar
Qi, L. S. et al. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152, 1173–1183 (2013).
Article CAS PubMed PubMed Central Google Scholar
Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 15, 554 (2014).
Article PubMed PubMed Central Google Scholar
Li, W. et al. Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR. Genome Biol. 16, 281 (2015).
Article CAS PubMed PubMed Central Google Scholar
Jeong, H.-H., Kim, S. Y., Rousseaux, M. W. C., Zoghbi, H. Y. & Liu, Z. Beta-binomial modeling of CRISPR pooled screen data identifies target genes with greater sensitivity and fewer false negatives. Genome Res. 29, 999–1008 (2019).
Article CAS PubMed PubMed Central Google Scholar
Daley, T. P. et al. CRISPhieRmix: a hierarchical mixture model for CRISPR pooled screens. Genome Biol. 19, 159 (2018).
Article PubMed PubMed Central Google Scholar
Zou, Y., Carbonetto, P., Wang, G. & Stephens, M. Fine-mapping from summary data with the ‘Sum of Single Effects’ model. PLoS Genet. 18, e1010299 (2022).
Article CAS PubMed PubMed Central Google Scholar
Tehranchi, A. et al. Fine-mapping cis-regulatory variants in diverse human populations. eLife 8, e39595 (2019).
Article PubMed PubMed Central Google Scholar
Tehranchi, A. K. et al. Pooled ChIP–seq links variation in transcription factor binding to complex disease risk. Cell 165, 730–741 (2016).
Article CAS PubMed PubMed Central Google Scholar
Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).
Article CAS PubMed PubMed Central Google Scholar
Currin, K. W. et al. Genetic effects on liver chromatin accessibility identify disease regulatory variants. Am. J. Hum. Genet. 108, 1169–1189 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kanai, M. et al. Insights from complex trait fine-mapping across diverse populations. Preprint at bioRxiv https://doi.org/10.1101/2021.09.03.21262975 (2021).
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Article Google Scholar
Biasella, F., Plössl, K., Karl, C., Weber, B. H. F. & Friedrich, U. Altered protein function caused by AMD-associated variant rs704 links vitronectin to disease pathology. Invest. Ophthalmol. Vis. Sci. 61, 2 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yao, Q. et al. Motif-Raptor: a cell type-specific and transcription factor centric approach for post-GWAS prioritization of causal regulators. Bioinformatics 37, 2103–2111 (2021).
Article CAS PubMed Google Scholar
Jing, Z., Liu, Y., Dong, M., Hu, S. & Huang, S. Identification of the DNA binding element of the human ZNF333 protein. J. Biochem. Mol. Biol. 37, 663–670 (2004).
CAS PubMed Google Scholar
Witzgall, R., O’Leary, E., Leaf, A., Onaldi, D. & Bonventre, J. V. The Krüppel-associated box-A (KRAB-A) domain of zinc finger proteins mediates transcriptional repression. Proc. Natl Acad. Sci. USA 91, 4514–4518 (1994).
Article CAS PubMed PubMed Central Google Scholar
Fass, D., Blacklow, S., Kim, P. S. & Berger, J. M. Molecular basis of familial hypercholesterolaemia from structure of LDL receptor module. Nature 388, 691–693 (1997).
Article CAS PubMed Google Scholar
Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).
Article CAS PubMed Google Scholar
Willer, C. J. et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).
Yu, T., Fife, J. D., Adzhubey, I., Sherwood, R. & Cassa, C. A. Joint estimation and imputation of variant functional effects using high throughput assay data. Preprint at medRxiv https://doi.org/10.1101/2023.01.06.23284280 (2023).
Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery 2016).
Clarke, S. L. et al. Coronary artery disease risk of familial hypercholesterolemia genetic variants independent of clinically observed longitudinal cholesterol exposure. Circ. Genom. Precis. Med. 15, e003501 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhou, Y., Pan, Q., Pires, D. E. V., Rodrigues, C. H. M. & Ascher, D. B. DDMut: predicting effects of mutations on protein stability using deep learning. Nucleic Acids Res. 51, W122–W128 (2023).
Article CAS PubMed PubMed Central Google Scholar
Jubb, H. C. et al. Arpeggio: a web server for calculating and visualising interatomic interactions in protein structures. J. Mol. Biol. 429, 365–371 (2017).
Article CAS PubMed PubMed Central Google Scholar
Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).
Article CAS PubMed Google Scholar
Rose, G. D., Geselowitz, A. R., Lesser, G. J., Lee, R. H. & Zehfus, M. H. Hydrophobicity of amino acid residues in globular proteins. Science 229, 834–838 (1985).
Article CAS PubMed Google Scholar
Ryu, J. & Pinello, L. pinellolab/crispr-bean: v0.2.9. Zenodo https://doi.org/10.5281/zenodo.10191493 (2023).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Article CAS PubMed PubMed Central Google Scholar
Cassa, C. A. et al. Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat. Genet. 49, 806–810 (2017).
Article CAS PubMed PubMed Central Google Scholar
Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science. Nat. Methods 11, 801–807 (2014).
Article CAS PubMed PubMed Central Google Scholar
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Article PubMed PubMed Central Google Scholar
Brnich, S. E. et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med. 12, 3 (2019).
Article PubMed PubMed Central Google Scholar
Domanski, M. J. et al. Time course of LDL cholesterol exposure and cardiovascular disease event risk. J. Am. Coll. Cardiol. 76, 1507–1516 (2020).
Article CAS PubMed Google Scholar
Duncan, M. S., Vasan, R. S. & Xanthakis, V. Trajectories of blood lipid concentrations over the adult life course and risk of cardiovascular disease and all-cause mortality: observations from the Framingham Study over 35 years. J. Am. Heart Assoc. 8, e011433 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mundal, L. & Retterstøl, K. A systematic review of current studies in patients with familial hypercholesterolemia by use of national familial hypercholesterolemia registries. Curr. Opin. Lipidol. 27, 388–397 (2016).
Article CAS PubMed Google Scholar
Frazer, J. et al. Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95 (2021).
Ioannidis, N. M. et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885 (2016).
Article CAS PubMed PubMed Central Google Scholar
Gao, H. et al. The landscape of tolerated genetic variation in humans and primates. Science 380, eabn8153 (2023).
Article CAS PubMed PubMed Central Google Scholar
Klimentidis, Y. C. et al. Phenotypic and genetic characterization of lower LDL cholesterol and increased type 2 diabetes risk in the UK Biobank. Diabetes 69, 2194–2205 (2020).
Article CAS PubMed PubMed Central Google Scholar
Oommen, D., Kizhakkedath, P., Jawabri, A. A., Varghese, D. S. & Ali, B. R. Proteostasis regulation in the endoplasmic reticulum: an emerging theme in the molecular pathology and therapeutic management of familial hypercholesterolemia. Front. Genet. 11, 570355 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wheeler, T. J., Clements, J. & Finn, R. D. Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models. BMC Bioinformatics 15, 7 (2014).
Article PubMed PubMed Central Google Scholar
Sanjana, N. E., Shalem, O. & Zhang, F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783–784 (2014).
Article CAS PubMed PubMed Central Google Scholar
Chen, B. et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell 155, 1479–1491 (2013).
Article CAS PubMed PubMed Central Google Scholar
Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).
Article CAS PubMed PubMed Central Google Scholar
Untergasser, A. et al. Primer3—new capabilities and interfaces. Nucleic Acids Res. 40, e115 (2012).
Article CAS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. In Proceedings of the 9th Python in Science Conference (eds Van der Walt, S. & Millman, J.) https://doi.org/10.25080/majora-92bf1922-011 (SciPy, 2010).
McWilliam, H. et al. Analysis tool web services from the EMBL-EBI. Nucleic Acids Res. 41, W597–W600 (2013).
Article PubMed PubMed Central Google Scholar
Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).
Article PubMed PubMed Central Google Scholar
Goujon, M. et al. A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Res. 38, W695–W699 (2010).
Article CAS PubMed PubMed Central Google Scholar
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Article PubMed PubMed Central Google Scholar
Morales, J. et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature 604, 310–315 (2022).
Article CAS PubMed PubMed Central Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Article PubMed PubMed Central Google Scholar
Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).
Article CAS PubMed PubMed Central Google Scholar
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Article PubMed PubMed Central Google Scholar
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Article CAS PubMed PubMed Central Google Scholar
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
Article CAS PubMed Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
The PyMOL Molecular Graphics System v.1.8 (Schrödinger, 2015).
Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Article CAS PubMed PubMed Central Google Scholar
Webb, B. & Sali, A. Comparative protein structure modeling using MODELLER. Curr. Protoc. Protein Sci. 86, 2.9.1–2.9.37 (2016).
Article PubMed Google Scholar
Waskom, M. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
Article Google Scholar
Ryu, J. K., Tognon, M. & Li, Z. pinellolab/bean_manuscript: v1.0.2. Zenodo https://doi.org/10.5281/zenodo.10775808 (2024).

Download references

Acknowledgements

We thank G. Losyev, A. James, Q. Qin, C. Smith, L. Blaine, K. Clement, Z. Patel, S. Yang and H. Boen for technical assistance. Funding for this work was obtained from UM1HG012010 (R.I.S. and L.P.), 1R01HL164409 (C.A.C., R.I.S. and L.P.), 1R01GM143249 (R.I.S.), R01HG010372 (C.A.C. and T.Y.), the American Cancer Society (R.I.S.), the American Heart Association (R.I.S.), the National Organization for Rare Diseases (R.I.S.), 1R35HG010717-01 (L.P.), the National Health and Medical Research Council of Australia (GNT1174405; D.B.A. and Y.Z.), and the Victorian Government’s Operational Infrastructure Support Program (Y.Z. and D.B.A.). We are indebted to the UKB and its participants (UKB application 41250 and IRB protocol 2020P002093).

Author information

Authors and Affiliations

Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
Jayoung Ryu, Zhijian Li, Manuel Tognon & Luca Pinello
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Jayoung Ryu
Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
Jayoung Ryu, Martin Jankowiak, Zhijian Li, Manuel Tognon & Luca Pinello
Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
Sam Barkal, Tian Yu, Matthew Francoeur, Quang Vinh Phan, Lara Brown, Vineel Bhat, Christopher A. Cassa & Richard I. Sherwood
School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
Yunzhuo Zhou & David B. Ascher
Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
Yunzhuo Zhou & David B. Ascher
Computer Science Department, University of Verona, Verona, Italy
Manuel Tognon
Department of Genetics, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Michael I. Love
Montreal Heart Institute, Montréal, Quebec, Canada
Guillaume Lettre
Faculté de Médecine, Université de Montréal, Montréal, Quebec, Canada
Guillaume Lettre
Department of Pathology, Harvard Medical School, Boston, MA, USA
Luca Pinello

Authors

Jayoung Ryu
View author publications
You can also search for this author in PubMed Google Scholar
Sam Barkal
View author publications
You can also search for this author in PubMed Google Scholar
Tian Yu
View author publications
You can also search for this author in PubMed Google Scholar
Martin Jankowiak
View author publications
You can also search for this author in PubMed Google Scholar
Yunzhuo Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Francoeur
View author publications
You can also search for this author in PubMed Google Scholar
Quang Vinh Phan
View author publications
You can also search for this author in PubMed Google Scholar
Zhijian Li
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Tognon
View author publications
You can also search for this author in PubMed Google Scholar
Lara Brown
View author publications
You can also search for this author in PubMed Google Scholar
Michael I. Love
View author publications
You can also search for this author in PubMed Google Scholar
Vineel Bhat
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Lettre
View author publications
You can also search for this author in PubMed Google Scholar
David B. Ascher
View author publications
You can also search for this author in PubMed Google Scholar
Christopher A. Cassa
View author publications
You can also search for this author in PubMed Google Scholar
Richard I. Sherwood
View author publications
You can also search for this author in PubMed Google Scholar
Luca Pinello
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.I.S. conceived the experimental design, and J.R. and L.P. conceptualized BEAN. S.B. collected screen data. J.R. developed BEAN, and M.J., M.I.L. and L.P. advised on design and implementation of BEAN. J.R. and T.Y. processed and analyzed data. T.Y. performed BE-Hive and FUSE analysis. M.F., Q.V.P. and R.I.S. performed downstream characterization of LDL-C GWAS variants. T.Y., L.B., V.B. and C.A.C. obtained and analyzed UKB data. Y.Z. led structural analysis of LDLR variants with J.R. and D.B.A. J.R. and Z.L. benchmarked classification performance. M.T. and L.P. performed analysis of variant impact on transcription factor binding. G.L. advised on library design. J.R. and R.I.S. drafted the manuscript. R.I.S., L.P. and C.A.C. provided guidance and supervised this project. All the authors wrote and approved the final manuscript.

Corresponding authors

Correspondence to Christopher A. Cassa, Richard I. Sherwood or Luca Pinello.

Ethics declarations

Competing interests

L.P. has financial interests in Edilytics and SeQure Dx. L.P.’s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict-of-interest policies. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Andrew Wood and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Base editor editing preference profile and context specificity.

Deamination motif and PAM-dependent editing preference of AID-BE5-SpRY from 7294 gRNAs and AID-BE5-Cas9NG from 7299 gRNAs with more than 9 read counts across any replicates of bulk samples. a) Context specificity of AID-BE5-SpRY are represented as sequence logos. The height of each base represents the relative editing efficiency with each base. b) Mean editing efficiency of AID-BE5-SpRY by protospacer position and PAM sequence. c) Context specificity of AID-BE5-CasNG is represented as sequence logos. The height of each base represents the relative editing efficiency with each base. d) Mean editing efficiency of AID-BE5-CasNG by protospacer position and PAM sequence.

Source data

Extended Data Fig. 2 Nucleotide-level editing comparison of reporter and endogenous locus.

a) Scatterplots comparing of per-nucleotide-level editing efficiencies between the reporter and endogenous target sites. All edits introduced by each of 49 gRNAs across four loci across 3 experimental replicates were plotted. Points are colored by the identity of nucleotide edit and gRNA. b) The same plot colored by gRNA strand. R; Pearson correlation coefficient. n; number of plotted editing rates.

Source data

Extended Data Fig. 3 BEAN plate diagrams.

Plate diagrams of a) BEAN b) BEAN-Reporter, c) BEAN-Uniform. X^b and all parameters with superscript b is not used for benchmark analyses.

Extended Data Fig. 4 LDL-C GWAS library classification task benchmark.

a) AUPRC plot for classifying positive splicing control variants against negative control variants. Metrics for all 5 replicates are shown as markers and metrics of 15 two-replicate subsamples among the 5 replicates are shown as box plots. Boxplot was plotted as described in the statistical note of the Methods section. b) Precision-Recall curve for classifying all positive control splice sites of against negative controls for all replicates with no failing samples. c) Precision-Recall curve for classifying splice sites of LDLR/MYLIP against negative controls for all replicates with no failing samples. d) Precision-Recall curve for classifying all positive control splice sites of against negative controls for 2-replicate subsample of the data. Mean Precision value for a recall across 15 subsample runs are plotted as solid line. e) Precision-Recall curve for classifying splice sites of LDLR/MYLIP against negative controls for 2-replicate subsample of the data. Mean Precision value for a recall across 15 subsample runs are plotted as solid line.

Source data

Extended Data Fig. 5 Comparison of inferred effect sizes of individually transfected LDL-C GWAS library gRNAs.

Scatterplot and Pearson correlation coefficients (R) of effect size estimates and the log fold change (LFC) of fluorescence signal following individual transfection of 22 gRNAs. R; Spearman correlation coefficient.

Source data

Extended Data Fig. 6 BEAN accurately estimates variant effect confidence from per-variant evidence in input data.

a-c) Scatterplot of 2,182 LDLR tiling library variants comparing a) n_norm and effective edit rates, b) effective edit rates and BEAN σ_μ, and c) n_norm and BEAN σ_μ. d) Histogram of effective edit rates of 76 LDLR tiling library variants with UKB LDL-C levels. Quartile bin cutoffs used to categorize variants are shown as dotted lines. e) Scatterplots of BEAN z-scores and statin-adjusted UKB LDL-C measurements for variants in each effective edit rate quartile bin. r and rho shows the Pearson and Sparman correlation coefficients, respectively.

Source data

Extended Data Fig. 7 LDLR tiling library classification task benchmark.

a) AUPRC of classifying Pathogenic/Likely Pathogenic from Benign/Likely Benign variants when using 4 replicates without failing samples and 6 2-replicates combinations among the replicates. Bounds and the center of the boxes are the interquartile ranges. Boxplot was plotted as described in the statistical note of the Methods section. b-e) Precision-recall curve of classifying b, d) Pathogenic/Likely Pathogenic c, e) Pathogenic from Benign/Likely Benign variants. Top panels (b, c) show classification when used 4 replicates without failing samples. Bottom panels (d, e) show when used 6 2-replicates combinations among 4 replicates without failing samples.

Source data

Extended Data Fig. 8 Comparison of functional impact and conservation within conserved LDLR domains.

Repeat domain alignments shown with BEAN z-score for a) LDLR class A repeat domain, b) LDLR class B repeat domain, c) EGF-like domains aligned with the Pfam profile HMM logo by Skylign, where the height of each position show its information content and letter heights show the total height scaled by relative frequencies of the letters in the position. For a), conserved cysteine residue position is highlighted and for b-c), consensus positions from Clustal Omega alignment output are highlighted in grey.

Source data

Extended Data Fig. 9 Expanded LDLR missense variant pathogenicity estimates with FUSE.

a) Scatterplot of all considered UKB variant mean statin-adjusted LDL-C level against imputed BEAN-FUSE score. b) Prediction outcome of unobserved variants with XGBoost model trained on observed UKB variants and mean statin-adjusted LDL levels. c-d) Correlation coefficients and root mean squared error (RMSE) for predicted and true UKB mean statin-adjusted LDL-C level for XGBoost model with FUSE score, PhastCons PhyloP conservation score, and both as the input in predicting LDL-C levels. c) Boxplot of metrics for prediction of observed variants with 10-fold cross validation (n = 10) d) Barplot of metrics for prediction of unobserved variants with model trained on observed variants (n = 1). r, ρ; Pearson, Spearman correlation coefficient, RMSE; Root mean squared error.

Source data

Extended Data Fig. 10 Local atomic interaction in wild type and mutated structure for selected variants in LDLR class B repeat domain.

a–k, Residues with interaction with the variant position are shown. Variant positions and interacting residues are colored by the reference amino acid and atomic elements (O: red, N: blue, S: yellow). Ref AA; reference amino acid.

Supplementary information

Supplementary Information

Supplementary Figs. 1–20 and Notes 1–8.

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Tables 1–11.

Supplementary Data 1

Annotated base editor plasmid sequences.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 9

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ryu, J., Barkal, S., Yu, T. et al. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. Nat Genet (2024). https://doi.org/10.1038/s41588-024-01726-6

Download citation

Received: 07 September 2023
Accepted: 21 March 2024
Published: 24 April 2024
DOI: https://doi.org/10.1038/s41588-024-01726-6

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links