Abstract
Non-coding genetic variants may cause disease by modulating gene expression. However, identifying these expression quantitative trait loci (eQTLs) is complicated by differences in gene regulation across fluid functional cell states within cell types. These states—for example, neurotransmitter-driven programs in astrocytes or perivascular fibroblast differentiation—are obscured in eQTL studies that aggregate cells1,2. Here we modelled eQTLs at single-cell resolution in one complex cell type: memory T cells. Using more than 500,000 unstimulated memory T cells from 259 Peruvian individuals, we show that around one-third of 6,511 cis-eQTLs had effects that were mediated by continuous multimodally defined cell states, such as cytotoxicity and regulatory capacity. In some loci, independent eQTL variants had opposing cell-state relationships. Autoimmune variants were enriched in cell-state-dependent eQTLs, including risk variants for rheumatoid arthritis near ORMDL3 and CTLA4; this indicates that cell-state context is crucial to understanding potential eQTL pathogenicity. Moreover, continuous cell states explained more variation in eQTLs than did conventional discrete categories, such as CD4+ versus CD8+, suggesting that modelling eQTLs and cell states at single-cell resolution can expand insight into gene regulation in functionally heterogeneous cell types.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The single-cell mRNA and surface protein data that support the findings of this study were published previously and are available in the Gene Expression Omnibus (GSE158769) and in dbGaP (phs002467). Genotype data were also previously published and are available in the Database of Genotypes and Phenotypes (dbGaP) (phs002025, for authorized general research use). We also used published datasets for validation and additional analyses: DICE (https://dice-database.org/downloads), BLUEPRINT (http://dcc.blueprint-epigenome.eu/), ref. 48 (GSE118189), ref. 24 (https://doi.org/10.5281/zenodo.4273999) and ref. 49 (Single Cell Portal accession SCP259).
Code availability
Scripts to reproduce analyses are available on GitHub (https://github.com/immunogenomics/sceQTL) and Zenodo (https://doi.org/10.5281/zenodo.6216850).
References
Wei, K. et al. Notch signalling drives synovial fibroblast identity and arthritis pathology. Nature 582, 259–264 (2020).
Cembrowski, M. S. & Menon, V. Continuous variation within cell types of the nervous system. Trends Neurosci. 41, 337–348 (2018).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Chun, S. et al. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600–605 (2017).
Umans, B. D., Battle, A. & Gilad, Y. Where are the disease-associated eQTLs? Trends Genet. 37, 109–124 (2021).
Gutierrez-Arcelus, M. et al. Allele-specific expression changes dynamically during T cell activation in HLA and other autoimmune loci. Nat. Genet. 52, 247–253 (2020).
Davenport, E. E. et al. Discovering in vivo cytokine–eQTL interactions from a lupus clinical trial. Genome Biol. 19, 168 (2018).
Strober, B. J. et al. Dynamic genetic regulation of gene expression during cellular differentiation. Science 364, 1287–1290 (2019).
Cuomo, A. S. E. et al. Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression. Nat. Commun. 11, 810 (2020).
Zhernakova, D. V. et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat. Genet. 49, 139–145 (2017).
Moore, R. et al. A linear mixed-model approach to study multivariate gene-environment interactions. Nat. Genet. 51, 180–186 (2019).
Kim-Hellmuth, S. et al. Cell type-specific genetic regulation of gene expression across human tissues. Science 369, eaaz8528 (2020).
Raj, T. et al. Polarization of the effects of autoimmune and neurodegenerative risk alleles in leukocytes. Science 344, 519–523 (2016).
Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat Genet. 45, 124–130 (2013).
Farh, K. H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2014).
Wambre, E. et al. A phenotypically and functionally distinct human TH2 cell subpopulation is associated with allergic disorders. Sci. Transl. Med. 9, eaam9171 (2017).
Arlehamn, C. L. et al. Transcriptional profile of tuberculosis antigen-specific T cells reveals novel multifunctional features. J. Immunol. 193, 2931–2940 (2014).
Eizenberg-Magar, I. et al. Diverse continuum of CD4+ T-cell states is determined by hierarchical additive integration of cytokine signals. Proc. Natl Acad. Sci. USA 114, E6447–E6456 (2017).
Annunziato, F., Cosmi, L., Liotta, F., Maggi, E. & Romagnini, S. Defining the human T helper 17 cell phenotype. Trends Immunol. 33, 505–512 (2012).
Kiner, E. et al. Gut CD4+ T cell phenotypes are a continuum molded by microbes, not by TH archetypes. Nat. Immunol. 22, 216–228 (2021).
van der Wijst, M. G. P. et al. Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs. Nat. Genet. 50, 493–497 (2018).
Jerber, J. et al. Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation. Nat. Genet. 53, 304–312 (2021).
Neavin, D. et al. Single cell eQTL analysis identifies cell type-specific genetic control of gene expression in fibroblasts and reprogrammed induced pluripotent stem cells. Genome Biol. 22, 76 (2021).
Randolph, H. E. et al. Genetic ancestry effects on the response to viral infection are pervasive but cell type specific. Science 374, 1127–1133 (2021).
Nathan, A. et al. Multimodally profiling memory T cells from a tuberculosis cohort identifies cell state associations with demographics, environment and disease. Nat. Immunol. 22, 781–793 (2021).
Luo, Y. et al. Early progression to active tuberculosis is a highly heritable trait driven by 3q23 in Peruvians. Nat. Commun. 10, 3765 (2019).
Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414 (2016).
Kasela, S. et al. Pathogenic implications for autoimmune mechanisms derived by comparative eQTL analysis of CD4+ versus CD8+ T cells. PLoS Genet. 13, e1006643 (2017).
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Schmiedel, B. J. et al. Impact of genetic polymorphisms on human immune cell gene expression. Cell 175, 1701–1715 (2018).
Rothenberg, E. V. & Taghon, T. Molecular genetics of T cell development. Annu. Rev. Immunol. 23, 601–649 (2005).
Townes, F. W., Hicks, S. C., Aryee, M. J. & Irizarry, R. A. Feature selection and dimension reduction for single-cell RNA-seq based on a multinomial model. Genome Biol. 20, 295 (2019).
Sarkar, A. & Stephens, M. Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis. Nat. Genet. 53, 770–777 (2021).
Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296 (2019).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Dobbyn, A. et al. Landscape of conditional eQTL in dorsolateral prefrontal cortex and co-localization with schizophrenia GWAS. Am. J. Hum. Genet. 102, 1169–1184 (2018).
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Laufer, V. A. et al. Genetic influences on susceptibility to rheumatoid arthritis in African-Americans. Hum. Mol. Genet. 28, 858–874 (2019).
Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).
Stahl, E. A. et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat. Genet. 42, 508–514 (2010).
Heintzman, N. D. et al. Histone modifications at human enhancers reflect global cell type-specific gene expression. Nature 459, 108–112 (2009).
Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).
Amariuta, T. et al. IMPACT: genomic annotation of cell-state-specific regulatory elements inferred from the epigenome of bound transcription factors. Am. J. Hum. Genet. 104, 879–895 (2019).
Zaitlen, N., Pasaniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010).
Calderon, D. et al. Landscape of stimulation-responsive chromatin across diverse human immune cells. Nat. Genet. 51, 1494–1505 (2019).
Smillie, C. S. et al. Intra- and inter-cellular rewiring of the human colon during ulcerative colitis. Cell 178, 714–730 (2019).
Reshef, Y. A. et al. Co-varying neighborhood analysis identifies cell populations associated with phenotypes of interest from single-cell transcriptomics. Nat. Biotechnol. 40, 355–363 (2022).
Burkhardt, D. B. et al. Quantifying the effect of experimental perturbations at single-cell resolution. Nat. Biotechnol. 39, 619–629 (2021).
Ben-David, E. et al. Whole-organism eQTL mapping at cellular resolution with single-cell sequencing. eLife 10, e65857 (2021).
Cuomo, A. S. E. et al. Optimizing expression quantitative trait locus mapping workflows for single-cell studies. Genome Biol. 22, 188 (2021).
van der Wijst, M. et al. The single-cell eQTLGen consortium. eLife 9, e52155 (2020).
Oelen, R. et al. Single-cell RNA-sequencing reveals widespread personalized, context-specific gene expression regulation in immune cells. Preprint at bioRxiv https://doi.org/10.1101/2021.06.04.447088 (2021).
Kotliar, D. et al. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-seq. eLife 8, e43803 (2019).
Acknowledgements
We thank H. E. Randolph and L. B. Barreiro for sharing insights and access to and the influenza PBMC dataset. This work is supported in part by funding from the National Institutes of Health (U19AI111224, UH2AR067677, T32HG002295, T32AR007530, U01HG009379, R01AI049313, R01AR063759 and U01HG012009)
Author information
Authors and Affiliations
Contributions
A.N. and S.R. conceptualized the study. A.N. and S.R. designed the statistical and computational strategy and analysed the data, with input from A.L.P. S.A., K.I., C.V., T.A. and Y.L. conducted additional statistical analyses. J.I.B., Y.B., S.S. and D.B.M. designed and conducted immunoprofiling experiments. L.L. and M.B.M. recruited, phenotyped and collected samples from individuals. A.N. and S.R. wrote the initial manuscript. All authors contributed to writing and editing the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks Benjamin Fairfax and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Memory T cell eQTLs.
a, (left) Box plot and (right) locus plot of rs3087243 eQTL for CTLA4 and b, rs716848 eQTL for ERAP2. Except where indicated, each point in box plots (panels a, b, d, f) represents the average log2(UMI counts + 1) across all cells in a donor (n = 259), grouped by genotype. Box plots show median (horizontal bar), 25th and 75th percentiles (lower and upper bounds of the box, respectively) and 1.5 times the IQR (or minimum/maximum values if they fall within that range; end of whiskers). Each locus plot shows the variants in a +/250kb window around the TSS plotted based on their nominal pseudobulk eQTL p value and genomic coordinate. The purple diamond is the lead variant and other variants are coloured based on their r2 with the lead variant in 1000 Genomes AMR (American ancestry, including Puerto Rican in Puerto Rico, Colombian in Medellín, Peruvian in Lima, and Mexican ancestry in Los Angeles). c, Pie charts of the allele frequencies at rs9927852 in 1000 Genomes EUR (European) and PEL populations. d, Box plot and locus plot of rs9927852 eQTL for MAF (n = 259). e, Number of eGenes with 1, 2, or 3+ independent eQTLs. f, Box plots for lead (rs9349050, left) secondary (rs6901281, middle), and secondary conditioned on lead (right) eQTL variants for MDGA1. In the box plot for rs6901281 conditioned on rs9349050, each point represents the average residual of log2(UMI counts + 1) after regressing out genotype at rs9349050 across all cells in a donor (n = 259). In the locus plot, the pink diamond is the secondary variant.
Extended Data Fig. 2 Concordance of published bulk T cell eQTLs and Peruvian (pseudo)bulk memory T cell eQTLs.
Z scores of β values from pseudobulk analysis of Peruvian dataset compared to z scores from a, inverse-variance-weighted meta-analysis of memory T cell subsets in DICE, b, BLUEPRINT bulk eQTL analysis of naïve CD4+ T cells, c, BLUEPRINT bulk eQTL analysis of monocytes, and d, BLUEPRINT bulk eQTL analysis of neutrophils. Each point represents an eGene/Peruvian lead variant pair significant in both datasets (q < 0.05).
Extended Data Fig. 3 Assessing the robustness of the single-cell Poisson model.
a, b, Dot plot of each memory-T-cell eGene based on Wald z score for total β (βG+βG×CD4) in PME interaction model of all cells and LRT-based statistic from a, pseudobulk model or b, Wald z score for βG from a PME model of only CD4+ cells. c, Proportion of eGenes with significant βG×CD4 under genotype permutation. Each dot represents the proportion significant at the given alpha threshold. d, e, Dot plot of memory-T-cell eGenes (n = 6,511) based on z score for cell state β (βCD4 or βCD8) and z score for cell state interaction β (βG×CD4 or βG×CD8) in d, LME and e, PME models. f, Box plot of sparsity of eGenes with significant CD4 interactions (q < 0.05) in GLMM or LMM. P value is from a two-sided Wilcoxon rank-sum test (nGLMM = 612 genes, nLMM = 1214 genes). Each point represents a gene and box plots show median (horizontal bar), 25th and 75th percentiles (lower and upper bounds of the box, respectively) and 1.5 times the IQR (or minimum/maximum values if they fall within that range; end of whiskers). g, rs2214911 eQTL for THAP5 and h, rs4135092 eQTL for TDG in CD4+ (orange) and CD8+ (beige) cells. Box plots show the eQTL effects as per-cell gene expression decreased from 100% to 50, 20, and 10 percent in CD4+ cells (left to right). Each point represents the average log2(UMI counts + 1) across all cells in the indicated subset of cells in a donor (n = 259), grouped by genotype. Box plots show median (horizontal bar), 25th and 75th percentiles (lower and upper bounds of the box, respectively) and 1.5 times the IQR (or minimum/maximum values if they fall within that range; end of whiskers).
Extended Data Fig. 4 Concordance between eQTL interactions with continuous and discrete states.
a, Distribution of CV1 scores for cells in CD4+ and CD8+ gates. Dashed line represents CV1 = 0. b-e, Dot plots of eGenes’ Wald z scores of genotype interactions with b, CV1 and CD4+, c, CV2 and Treg, d, CV1 and Treg, or CV2 and CD4+. Dashed line represents the identity line. Only eGenes with significant interaction (LRT q < 0.05) are plotted in b and c. Black dots represent eGenes significantly interacting with both continuous and discrete states, red dots are only significantly interacting with continuous state, and blue dots are only significantly interacting with discrete state. r is calculated as the Pearson correlation coefficient.
Extended Data Fig. 5 Cell-state-dependent eQTL interactions with continuous CVs.
a, Number of significant eGenes (LRT q < 0.05) detected by PME interaction models with increasing numbers of CVs. b, Number of eGenes with significant interaction with each CV in a multivariate PME model with 7 CVs. c, Heat map of the number of eGenes with significant interactions with pairs of CVs in the multivariate model. Boxes along the diagonal reflect the total number of eGenes interacting with the corresponding CV. d, Proportion of eGenes in c with the same direction of effect. e, eGenes with significant interactions with either CV1 or CV2 plotted based on Wald z scores with CV2 and CV1. f, eGenes with significant interactions with either CV1 or CV3 plotted based on Wald z scores with CV3 and CV1.
Extended Data Fig. 6 Opposite interaction directions for independent variants in a locus.
a, Bar plot of the number of eGenes for which the lead and secondary variants have opposite directions of interaction effect for each of the seven CVs. b, Comparison of interaction effect direction for lead and secondary variants for each of 436 eGenes with 2+ independent eQTLs. Each plot corresponds to one CV, from CV1 to CV7. Each point represents an eGene. For eGenes in grey, neither lead nor secondary variant was significantly dependent on the given CV state. For eGenes in black, either only one of the two eQTLs was significantly dependent on the CV, or both were significantly state-dependent with the same direction of effect. For eGenes in red, both lead and secondary variants were significantly state-dependent but with different directions of interaction with the given CV.
Extended Data Fig. 7 Enrichment of eQTLs in disease-associated variants.
a, Dot plot of traits from the GWAS catalogue plotted based on the −log(Fisher p value) and odds ratio of the enrichment test comparing the proportion of GWAS variants colocalizing with memory-T cell eQTLs for one trait compared to all other traits. Labelled traits have p < 10−5. b, Dot plot of traits from the GWAS catalogue plotted based on the number of GWAS variants overlapping state-dependent eQTLs compared to the total number of GWAS variants overlapping eQTLs. Overlap was defined by r2 > 0.5. The dashed line represents the overall proportion of state-dependent eQTLs (2,117/6,511 = 0.33) and labelled traits have Fisher p < 0.005. c, Dot plot of traits plotted based on the number of GWAS variants colocalizing with state-dependent eQTLs compared to the total number of GWAS variants colocalizing with eQTLs under the Bayesian coloc model. The dashed line represents the overall proportion of state-dependent eQTLs (2,117/6,511 = 0.33) and traits outlined in red are significantly enriched at Fisher p < 0.01. AD=Atopic dermatitis, AS=Ankylosing spondylitis, SLE=Systemic lupus erythematosus, T(1/2)D=Type 1/2 diabetes, RA=Rheumatoid arthritis, CAD=Coronary artery disease, MS=Multiple sclerosis, IBD=Inflammatory bowel disease, CD=Crohn’s disease.
Extended Data Fig. 8 Additional regulatory region enrichment of eQTL effects.
a, We calculated the enrichment of lead effects or independent secondary (conditional) effects in promoter or T-cell-specific regulatory regions. Analysis was limited to loci that were also significant eGenes in Peruvian analysis and where at least one variant had PIP > = 0.5. b, We calculated the enrichment of lead effects and c, secondary effects in ATAC-seq peaks from Calderon, et al. 2019. Peaks were binarized as present or not in each sample at a threshold of > 5 CPM. In all plots, the height of the grey bar corresponds to the average enrichment calculated across all loci containing a variant with PIP < 0.05, red bar corresponds to the subset with significant cell-state interaction (LRT q < 0.05 in multivariate model with 7 CVs), and the blue bar corresponds to the subset without significant cell-state interaction. Bars with p > = 0.001 (limit of 1,000 permutations) are labelled with their one-sided p-value and with corresponding asterisks. Each pair of interacting/non-interacting bars is labelled with a one-sided permutation p value for the difference (interacting minus non-interacting; only labelled if p < 0.1 for the ATAC-seq analysis). The grey dotted line indicates enrichment statistic = 1.
Supplementary information
Supplementary Information
This file contains a Supplementary Note, Supplementary Methods, Supplementary References, Supplementary Figures 1–17 and full descriptions for Supplementary Tables 1–48 (tables supplied separately).
Supplementary Tables
Supplementary Tables 1–48 – see Supplementary Information document for full descriptions.
Rights and permissions
About this article
Cite this article
Nathan, A., Asgari, S., Ishigaki, K. et al. Single-cell eQTL models reveal dynamic T cell state dependence of disease loci. Nature 606, 120–128 (2022). https://doi.org/10.1038/s41586-022-04713-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-022-04713-1
This article is cited by
-
Mapping the functional impact of non-coding regulatory elements in primary T cells through single-cell CRISPR screens
Genome Biology (2024)
-
SURGE: uncovering context-specific genetic-regulation of gene expression from single-cell RNA sequencing using latent-factor models
Genome Biology (2024)
-
Cell-type-specific and disease-associated expression quantitative trait loci in the human lung
Nature Genetics (2024)
-
Pleiotropy, epistasis and the genetic architecture of quantitative traits
Nature Reviews Genetics (2024)
-
High-dimensional phenotyping to define the genetic basis of cellular morphology
Nature Communications (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.