Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Identifying genetic variants that influence the abundance of cell states in single-cell data

Abstract

Disease risk alleles influence the composition of cells present in the body, but modeling genetic effects on the cell states revealed by single-cell profiling is difficult because variant-associated states may reflect diverse combinations of the profiled cell features that are challenging to predefine. We introduce Genotype–Neighborhood Associations (GeNA), a statistical tool to identify cell-state abundance quantitative trait loci (csaQTLs) in high-dimensional single-cell datasets. Instead of testing associations to predefined cell states, GeNA flexibly identifies the cell states whose abundance is most associated with genetic variants. In a genome-wide survey of single-cell RNA sequencing peripheral blood profiling from 969 individuals, GeNA identifies five independent loci associated with shifts in the relative abundance of immune cell states. For example, rs3003-T (P = 1.96 × 10−11) associates with increased abundance of natural killer cells expressing tumor necrosis factor response programs. This csaQTL colocalizes with increased risk for psoriasis, an autoimmune disease that responds to anti-tumor necrosis factor treatments. Flexibly characterizing csaQTLs for granular cell states may help illuminate how genetic background alters cellular composition to confer disease risk.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Method schematic.
Fig. 2: csaQTLs detected in the OneK1K dataset.
Fig. 3: Characterization of the csaQTL at 12p13.2.
Fig. 4: PRSs aggregate the effects of individual loci to highlight disease-relevant cell states, a valuable point of comparison with single-cell case-control analyses.

Data availability

The OneK1K single-cell and genotyping data are available via Gene Expression Omnibus (GEO; GSE196830). The Perez et al. genotyping data is available on DBGaP (phs002812.v1.p1) and the corresponding single-cell data is available via GEO (GSE174188). The Randolph et al. genotyping data is available on the Sequence Read Archive (PRJNA736483) and corresponding single-cell data is available on GEO (GSE162632). The Jerber et al. genotyping data is available from the European Nucleotide Archive (Project ID PRJEB11750) and the corresponding single-cell data is available on Zenodo (https://zenodo.org/records/4651413)83. The Oelen et al. single-cell data is available via the European Genome-Phenome Archive (EGAS00001005376) and genotyping data can be accessed through an application process at https://eqtlgen.org/sc/datasets/1m-scbloodnl.html.

Code availability

An open-source repository containing the implementation of GeNA can be found on GitHub (https://github.com/immunogenomics/GeNA/) and Zenodo (https://zenodo.org/doi/10.5281/zenodo.13152792)77. All code underlying our figures and tables can be found on GitHub (https://github.com/immunogenomics/GeNA-applied/) and Zenodo (https://zenodo.org/doi/10.5281/zenodo.13281284)78.

References

  1. Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP–trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

    Article  CAS  PubMed  Google Scholar 

  3. Shendure, J., Findlay, G. M. & Snyder, M. W. Genomic medicine–progress, pitfalls, and promise. Cell 177, 45–57 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Yazar, S. et al. Single-cell eQTL mapping identifies cell type-specific genetic control of autoimmune disease. Science 376, eabf3041 (2022).

    Article  CAS  PubMed  Google Scholar 

  5. Wang, Q. S. et al. Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs. Nat. Commun. 12, 3394 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Hormozdiari, F. et al. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat. Genet. 50, 1041–1047 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Nathan, A. et al. Single-cell eQTL models reveal dynamic T cell state dependence of disease loci. Nature 606, 120–128 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Yamaguchi, K. et al. Splicing QTL analysis focusing on coding sequences reveals mechanisms for disease susceptibility loci. Nat. Commun. 13, 4659 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Sun, B. B. et al. Genomic atlas of the human plasma proteome. Nature 558, 73–79 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Gudjonsson, A. et al. A genome-wide association study of serum proteins reveals shared loci with common diseases. Nat. Commun. 13, 480 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Wu, L. et al. Variation and genetic control of protein abundance in humans. Nature 499, 79–82 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. He, B., Shi, J., Wang, X., Jiang, H. & Zhu, H.-J. Genome-wide pQTL analysis of protein expression regulatory networks in the human liver. BMC Biol. 18, 97 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Orrù, V. et al. Complex genetic signatures in immune cells underlie autoimmunity and inform therapy. Nat. Genet. 52, 1036–1045 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Gate, R. E. et al. Genetic determinants of co-accessible chromatin regions in activated T cells across humans. Nat. Genet. 50, 1140–1150 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Kumasaka, N., Knights, A. J. & Gaffney, D. J. Fine-mapping cellular QTLs with RASQUAL and ATAC-seq. Nat. Genet. 48, 206–213 (2016).

    Article  CAS  PubMed  Google Scholar 

  18. Currin, K. W. et al. Genetic effects on liver chromatin accessibility identify disease regulatory variants. Am. J. Hum. Genet. 108, 1169–1189 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Huan, T. et al. Genome-wide identification of DNA methylation QTLs in whole blood highlights pathways for cardiovascular disease. Nat. Commun. 10, 4267 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  20. McRae, A. F. et al. Identification of 55,000 replicated DNA methylation QTL. Sci. Rep. 8, 17605 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Chun, S. et al. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600–605 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Connally, N. J. et al. The missing link between genetic association and regulatory function. eLife 11, e74970 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Mostafavi, H., Spence, J. P., Naqvi, S. & Pritchard, J. K. Systematic differences in discovery of genetic effects on gene expression and complex traits. Nat. Genet. 55, 1866–1875 (2023).

    Article  CAS  PubMed  Google Scholar 

  24. Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429.e19 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Vuckovic, D. et al. The polygenic and monogenic basis of blood traits and diseases. Cell 182, 1214–1231.e11 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Chen, M.-H. et al. Trans-ethnic and ancestry-specific blood-cell genetics in 746,667 individuals from 5 global populations. Cell 182, 1198–1213.e14 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Chu, X. et al. A genome-wide functional genomics approach uncovers genetic determinants of immune phenotypes in type 1 diabetes. eLife 11, e73709 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Aguirre-Gamboa, R. et al. Differential effects of environmental and genetic factors on T and B cell immune traits. Cell Rep. 17, 2474–2487 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Kachuri, L. et al. Genetic determinants of blood-cell traits influence susceptibility to childhood acute lymphoblastic leukemia. Am. J. Hum. Genet. 108, 1823–1835 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Kraal, G., Weissman, I. L. & Butcher, E. C. Genetic control of T-cell subset representation in inbred mice. Immunogenetics 18, 585–592 (1983).

    Article  CAS  PubMed  Google Scholar 

  31. Dendrou, C. A. et al. Cell-specific protein phenotypes for the autoimmune locus IL2RA using a genotype-selectable human bioresource. Nat. Genet. 41, 1011–1015 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Soule, T. G. et al. A protocol for single nucleus RNA-seq from frozen skeletal muscle. Life Sci. Alliance 6, e202201806 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Slyper, M. et al. A single-cell and single-nucleus RNA-seq toolbox for fresh and frozen human tumors. Nat. Med. 26, 792–802 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Piwecka, M., Rajewsky, N. & Rybak-Wolf, A. Single-cell and spatial transcriptomics: deciphering brain complexity in health and disease. Nat. Rev. Neurol. 19, 346–362 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Nath, A. P. et al. Multivariate genome-wide association analysis of a cytokine network reveals variants with widespread immune, haematological, and cardiometabolic pleiotropy. Am. J. Hum. Genet. 105, 1076–1090 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Reshef, Y. A. et al. Co-varying neighborhood analysis identifies cell populations associated with phenotypes of interest from single-cell transcriptomics. Nat. Biotechnol. 40, 355–363 (2022).

    Article  CAS  PubMed  Google Scholar 

  37. Patin, E. et al. Natural variation in the parameters of innate immune cells is preferentially driven by genetic factors. Nat. Immunol. 19, 302–314 (2018).

    Article  CAS  PubMed  Google Scholar 

  38. Perez, R. K. et al. Single-cell RNA-seq reveals cell type-specific molecular and genetic associations to lupus. Science 376, eabf1970 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Randolph, H. E. et al. Genetic ancestry effects on the response to viral infection are pervasive but cell type specific. Science 374, 1127–1133 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Oelen, R. et al. Single-cell RNA-sequencing of peripheral blood mononuclear cells reveals widespread, context-specific gene expression regulation upon pathogenic exposure. Nat. Commun. 13, 3267 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Stuart, P. E. et al. Transethnic analysis of psoriasis susceptibility in South Asians and Europeans enhances fine mapping in the MHC and genome wide. Hum. Genet. Genom. Adv. 3, 100069 (2002).

    Article  Google Scholar 

  43. Tsoi, L. C. et al. Large scale meta-analysis characterizes genetic architecture for common psoriasis associated variants. Nat. Commun. 8, 15382 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Lowes, M. A., Bowcock, A. M. & Krueger, J. G. Pathogenesis and therapy of psoriasis. Nature 445, 866–873 (2007).

    Article  CAS  PubMed  Google Scholar 

  45. Berekmeri, A., Mahmood, F., Wittmann, M. & Helliwell, P. Tofacitinib for the treatment of psoriasis and psoriatic arthritis. Expert Rev. Clin. Immunol. 14, 719–730 (2018).

    Article  CAS  PubMed  Google Scholar 

  46. Neale, B. M. et al. UK BioBank Round 2 Results. Neale Lab http://www.nealelab.is/uk-biobank/ (2018).

  47. Ng, P. C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. 76, 7.20.1–7.20.41 (2013).

    Google Scholar 

  49. Vogler, M. BCL2A1: the underdog in the BCL2 family. Cell Death Differ. 19, 67–74 (2012).

    Article  CAS  PubMed  Google Scholar 

  50. Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Ji, S.-G. et al. Genome-wide association study of primary sclerosing cholangitis identifies new risk loci and quantifies the genetic relationship with inflammatory bowel disease. Nat. Genet. 49, 269–273 (2017).

    Article  CAS  PubMed  Google Scholar 

  52. Kunzmann, L. K. et al. Monocytes as potential mediators of pathogen-induced T-helper 17 differentiation in patients with primary sclerosing cholangitis (PSC). Hepatology 72, 1310–1326 (2020).

    Article  CAS  PubMed  Google Scholar 

  53. Schmiedel, B. J. et al. Impact of genetic polymorphisms on human immune cell gene expression. Cell 175, 1701–1715.e16 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Han, Y. et al. Genome-wide analysis highlights contribution of immune system pathways to the genetic architecture of asthma. Nat. Commun. 11, 1776 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Chiou, J. et al. Interpreting type 1 diabetes risk with genetics and single-cell epigenomics. Nature 594, 398–402 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Phelan, C. M. et al. Identification of 12 new susceptibility loci for different histotypes of epithelial ovarian cancer. Nat. Genet. 49, 680–691 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Chen, L., Morris, D. L. & Vyse, T. J. Genetic advances in systemic lupus erythematosus: an update. Curr. Opin. Rheumatol. 29, 423–433 (2017).

    Article  PubMed  Google Scholar 

  58. Chambers, S. A., Allen, E., Rahman, A. & Isenberg, D. Damage and mortality in a group of British patients with systemic lupus erythematosus followed up for over 10 years. Rheumatology 48, 673–675 (2009).

    Article  PubMed  Google Scholar 

  59. Chen, L. et al. Genome-wide assessment of genetic risk for systemic lupus erythematosus and disease severity. Hum. Mol. Genet. 29, 1745–1756 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Choi, S. W., Mak, T. S.-H. & O’Reilly, P. F. Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc. 15, 2759–2772 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Rice, G. I. et al. Gain-of-function mutations in IFIH1 cause a spectrum of human disease phenotypes associated with upregulated type I interferon signaling. Nat. Genet. 46, 503–509 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Barnes, B. J., Moore, P. A. & Pitha, P. M. Virus-specific activation of a novel interferon regulatory factor, IRF-5, results in the induction of distinct interferon α genes. J. Biol. Chem. 276, 23382–23390 (2001).

    Article  CAS  PubMed  Google Scholar 

  63. Rönnblom, L. & Leonard, D. Interferon pathway in SLE: one key to unlocking the mystery of the disease lupus. Sci. Med. 6, e000270 (2019).

    Google Scholar 

  64. Fike, A. J., Elcheva, I. & Rahman, Z. S. M. The post-GWAS era: how to validate the contribution of gene variants in lupus. Curr. Rheumatol. Rep. 21, 3 (2019).

    Article  PubMed  Google Scholar 

  65. Deeks, E. D. Anifrolumab: first approval. Drugs 81, 1795–1802 (2021).

    Article  CAS  PubMed  Google Scholar 

  66. Privé, F. et al. Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort. Am. J. Hum. Genet. 109, 12–23 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  67. Jerber, J. et al. Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation. Nat. Genet. 53, 304–312 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Sansom, S. N. et al. The level of the transcription factor Pax6 is essential for controlling the balance between neural stem cell self-renewal and neurogenesis. PLoS Genet. 5, e1000511 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Thakurela, S. et al. Mapping gene regulatory circuitry of Pax6 during neurogenesis. Cell Discov. 2, 15045 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Bertacchi, M. et al. NR2F1 regulates regional progenitor dynamics in the mouse neocortex and cortical gyrification in BBSOAS patients. EMBO J. 39, e104163 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Ypsilanti, A. R. et al. Transcriptional network orchestrating regional patterning of cortical progenitors. Proc. Natl Acad. Sci. USA 118, e2024795118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Mangino, M., Roederer, M., Beddall, M. H., Nestle, F. O. & Spector, T. D. Innate and adaptive immune traits are differentially affected by genetic and environmental factors. Nat. Commun. 8, 13850 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Nathan, A. et al. Multimodally profiling memory T cells from a tuberculosis cohort identifies cell state associations with demographics, environment and disease. Nat. Immunol. 22, 781–793 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Mathys, H. et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature 570, 332–337 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Ramachandran, P. et al. Resolving the fibrotic niche of human liver cirrhosis at single-cell level. Nature 575, 512–518 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R. & Wu, A. Y. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45, 891–923 (1998).

    Article  Google Scholar 

  77. Schnitzler, G. R. et al. Convergence of coronary artery disease genes onto endothelial cell programs. Nature 626, 799–807 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Rumker, L. Immunogenomics/GeNA: initial release. Zenodo https://zenodo.org/doi/10.5281/zenodo.13152792 (2024).

  79. Rumker, L. Immunogenomics/GeNA-applied: version 1.0.0. Zenodo https://zenodo.org/doi/10.5281/zenodo.13281284 (2024).

  80. Frankish, A. et al. GENCODE 2021. Nucleic Acids Res. 49, D916–D923 (2021).

    Article  CAS  PubMed  Google Scholar 

  81. Kang, J. B. et al. Efficient and precise single-cell reference atlas mapping with Symphony. Nat. Commun. 12, 5890 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Tigchelaar, E. F. et al. Cohort profile: LifeLines DEEP, a prospective, general population cohort study in the northern Netherlands: study design and baseline characteristics. BMJ Open 5, e006772 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  83. Cuomo, A. S. E. Population-scale single-cell RNA-sequencing of iPS cells differentiating towards dopaminergic neurons. Zenodo https://zenodo.org/doi/10.5281/zenodo.4072908 (2024).

Download references

Acknowledgements

We thank our fellow members of the Raychaudhuri Lab, as well as Y. Luo and members of the Alkes Price and Shamil Sunyaev Laboratories, for their helpful feedback. L.R. is supported by award F30AI157385 from the National Institute of Allergy and Infectious Diseases. J.B.K. is supported by award F30AI172238 from the National Institute of Allergy and Infectious Diseases. L.R., J.B.K. and K.A.L. are supported by awards T32GM144273 and T32HG002295 from the National Institute of General Medical Sciences. C.V. is supported by award T32HG01046 from the National Human Genome Research Institute. J.A.-H. is supported by EL1 National Health and Medical Research Council (NHMRC) grant no. 2018432. P.-R.L. is supported by a Burroughs Wellcome Fund Career Award at the Scientific Interfaces. J.E.P. is supported by award 1175781 from the NHMRC and a fellowship from the Fok Foundation. S.R. is supported by awards R01AR063759 and UC2AR081023 from the National Institute of Arthritis and Musculoskeletal and Skin Diseases, U01HG012009 and R56HG013083 from the National Human Genome Research Institute, and P01AI148102 from the National Institute of Allergy and Infectious Diseases. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Per the agreement for the Oelen et al. data, we thank the participants and the staff of Lifelines DEEP DAG2+ for their collaboration. Funding for that project was provided by the European Research Council Starting Grant no. 637640. We also acknowledge the funders of the Lifelines Cohort Study, the sample collections from which the Oelen et al. project data have been derived. Finally, we are grateful to all participants in the study cohorts whose data we have analyzed in this paper. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

L.R. and S.R. designed and conceptualized the study. L.R. designed and implemented the algorithm and performed data analysis with input from S.S., Y.R., P.-R.L. and S.R. S.S., C.V. and A.N. contributed to genotype data processing. J.B.K., S.Y. and J.A.-H. contributed to OneK1K dataset processing. S.Y., J.A.-H. and J.E.P. provided dataset-specific expertise. L.R. and S.R. wrote the manuscript with contributions from the remaining authors S.S., Y.R., J.B.K., S.Y., J.A.-H., C.V., K.A.L., A.M.-S., A.N., J.E.P. and P.-R.L.

Corresponding author

Correspondence to Soumya Raychaudhuri.

Ethics declarations

Competing interests

S.R. is a founder for Mestag Therapeutics and a scientific advisor for Janssen, Sonoma and Pfizer. J.E.P. is a founder of CellTellus Laboratory. The other authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Guillaume Lettre and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Schematic representation of our approach to project a neighborhood-based phenotype into an independent dataset for testing of association replication.

We use a published reference mapping algorithm, Symphony, to project each cell from the replication dataset (blue labels) into the embedding used for construction of the nearest neighbor graph from the discovery dataset (orange labels). For each replication dataset cell, we store its distance to the 15 nearest discovery dataset cells; these represent the seed weights of this replication dataset cell in the discovery dataset neighborhoods, of which there is one per discovery dataset cell. We use diffusion in the nearest neighbor graph to obtain from these seed weights the fractional membership of each replication dataset cell within all discovery dataset neighborhoods. For each replication dataset sample, the combination of neighborhood memberships across all cells in the sample yields the fractional abundance of that sample across discovery dataset neighborhoods. Row-wise stacking these per-sample vectors into a matrix produces an estimated Neighborhood Abundance Matrix (NAM) containing the distribution of each replication dataset sample across discovery dataset neighborhoods. We can then use the stored products of the discovery dataset NAM SVD to obtain loadings for each replication dataset sample on the discovery dataset NAM-PCs, as shown. Combining the replication dataset sample loadings on the discovery dataset NAM-PCs with the fitted coefficients that define the phenotype in the discovery dataset produces an estimated phenotype value per replication dataset sample, which can be used to test for association to the allele of interest (or case-control status), controlling for relevant covariates.

Extended Data Fig. 2 Power to detect associations between simulated genotype values and real single-cell traits.

For 94 real cell-state abundance traits that vary across individuals in the OneK1K dataset, we defined simulated genotype values per individual to create true associations to these traits. By including random noise in the simulated genotypes we can use these data to quantify the fraction of true associations detected by GeNA (power) across a spectrum of noise levels. At each noise level, we show the mean and standard error of statistical power across traits for a given p-value threshold. A dashed line is shown at y = 0.05.

Extended Data Fig. 3 Illustration of 14 real cell-state abundance traits used in our non-null simulated GWAS for T cells.

For each trait in Supplementary Fig. 9, we plot the true cell-level phenotype for which we simulated associated genotypes with varying levels of noise. Each UMAP includes one dot per T cell in the OneK1K dataset. Cells that do not affect the trait are colored grey. For example, for the “CD8 Naive Program 1” trait, we used a gene expression program that varies substantially across naive CD8 + T cells. We quantified the usage of that program across all naive CD8 + T cells and defined the trait value per individual in the dataset as the mean use of that gene expression program across all naive CD8 + T cells in that individual’s sample. Therefore, for the “CD8 Naive Program 1” trait we color all cells that are not naive CD8 + T cells grey because they do not influence the trait and we color each naive CD8 + T cell by its use of the gene expression program. Cells with greater use of the program are colored deeper red and cells with less use of the program are colored deeper blue.

Extended Data Fig. 4 Characterization of the csaQTL at 15q25.1.

(a) Boxplot of sample-level phenotype values for each individual, organized by genotype at the lead SNP. (N: C/C 297, C/T 194, T/T 32) (Bold line: Q2. Box: Q1-Q3. Whisker: furthest observation within 1.5xIQR of the box.) We also show the GeNA p-value. (b) UMAP of myeloid cells colored by neighborhood-level phenotype value (that is, correlation between cell abundance and dose of alternative allele per neighborhood). (c) Violin plot of neighborhood-level phenotype value distribution within CD14+ monocytes, CD16+ monocytes and dendritic cells. (d) Heatmap of expression across neighborhoods for genes with strong correlations in expression to the csaQTL neighborhood-level phenotype. Neighborhoods are arrayed along the x-axis by phenotype value. (e) UMAP of myeloid cells colored by cell type assignment to the CD16+ monocyte cluster. We also show the Pearson’s r value between neighborhood-level phenotype values and a binary encoding of CD16+ monocyte cluster membership per cell. (f) Boxplot of cluster-based CD16+ monocyte % myeloid cells trait value per OneK1K donor by genotype. (Box and whiskers defined as in subplot a). The csaQTL lead SNP explains 6.5% of variance in this phenotype. (g) Locus zoom plot with one marker per tested SNP, genomic position along the x-axis, and GeNA p-value on the y-axis. Each SNP marker is colored by LD value relative to the lead SNP. The lead SNP is labeled with a green diamond. The BCL2A1 eQTL lead SNP and primary sclerosing cholangitis risk lead SNP are labeled with purple triangles. (h) Diagram of genotypes for the csaQTL lead SNP and colocalizing associations to molecular, tissue and organism-level traits at this locus.

Extended Data Fig. 5 Comparison of effects captured by NAM-PCs to published csaQTL associations, using approximation of flow cytometry phenotypes in scRNA-seq data.

Z-scores for published SNP associations to specific cell-state abundance phenotypes quantified using flow cytometry by Orrù et al. are shown on the x-axis. For each SNP-trait pair, a corresponding Z-score is shown on the y-axis reflecting an association test in the OneK1K dataset between genotype and the best approximation that can be captured by NAM-PCs of a cluster-based estimate of that phenotype in the OneK1K dataset (Supplementary Methods).

Extended Data Fig. 6 GeNA’s statistical power increases linearly with the number of samples included in the single-cell dataset.

We downsampled the OneK1K cohort individuals at random to 80%, 60%, 40% or 20% of the total donor count and repeated our power analysis simulation for each downsampled dataset. Here we plot statistical power by dataset size for simulated genotypes that explain 6% or 12% of variance in the associated cell-state abundance shift trait. At each cohort size, we show the mean and standard error of GeNA’s statistical power for simulated SNPs that explain 6% or 12% of phenotypic variance.

Supplementary information

Supplementary Information

Supplementary Tables 1–16, Figs. 1–35 and Notes 1 and 2.

Reporting Summary

Peer Review File

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rumker, L., Sakaue, S., Reshef, Y. et al. Identifying genetic variants that influence the abundance of cell states in single-cell data. Nat Genet (2024). https://doi.org/10.1038/s41588-024-01909-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41588-024-01909-1

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research