Abstract
Transcriptome-wide association studies (TWASs) aim to integrate genome-wide association studies with expression-mapping studies to identify genes with genetically predicted expression (GReX) associated with a complex trait. In the present report, we develop a method, GIFT (gene-based integrative fine-mapping through conditional TWAS), that performs conditional TWAS analysis by explicitly controlling for GReX of all other genes residing in a local region to fine-map putatively causal genes. GIFT is frequentist in nature, explicitly models both expression correlation and cis-single nucleotide polymorphism linkage disequilibrium across multiple genes and uses a likelihood framework to account for expression prediction uncertainty. As a result, GIFT produces calibrated P values and is effective for fine-mapping. We apply GIFT to analyze six traits in the UK Biobank, where GIFT narrows down the set size of putatively causal genes by 32.16–91.32% compared with existing TWAS fine-mapping approaches. The genes identified by GIFT highlight the importance of vessel regulation in determining blood pressures and lipid metabolism for regulating lipid levels.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
No data were generated in the present study. The GEUVADIS gene expression data are publicly available at https://www.internationalgenome.org/data-portal/data-collection/geuvadis. The UK Biobank data were obtained from the UK Biobank resource (http://www.ukbiobank.ac.uk). The 1000 Genomes project data (phase 3) are available at https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502. GENCODE (release 12) is available at https://www.gencodegenes.org/human/release_12.html.
Code availability
GIFT is implemented in the R package GIFT, freely available at https://yuanzhongshang.github.io/GIFT and https://zenodo.org/records/10070491. The analysis code used to reproduce all results is available at https://zenodo.org/records/10070491. We also performed analysis using FOCUS (v.0.802, https://github.com/mancusolab/ma-focus), FOGS (v.2.0, https://github.com/ChongWu-Biostat/FOGS), MV-IWAS (v.0.0.0.9000, https://github.com/kathalexknuts/MVIWAS), clusterProfiler (v.3.14.3, https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) and PLINK (v.1.90b6.13, https://www.cog-genomics.org/plink/1.9).
References
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Yuan, Z. et al. Testing and controlling for horizontal pleiotropy with probabilistic Mendelian randomization in transcriptome-wide association studies. Nat. Commun. 11, 3861 (2020).
Liu, L., Zeng, P., Xue, F., Yuan, Z. & Zhou, X. Multi-trait transcriptome-wide association studies with probabilistic Mendelian randomization. Am. J. Hum. Genet. 108, 240–256 (2021).
Yang, C. et al. CoMM: a collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information. Bioinformatics 35, 1644–1652 (2019).
Zeng, P. & Zhou, X. Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models. Nat. Commun. 8, 456 (2017).
Nagpal, S. et al. TIGAR: an improved Bayesian tool for transcriptomic data imputation enhances gene mapping of complex traits. Am. J. Hum. Genet. 105, 258–266 (2019).
Zhang, Y. et al. PTWAS: investigating tissue-relevant causal molecular mechanisms of complex traits using probabilistic TWAS analysis. Genome Biol. 21, 232 (2020).
Luningham, J. M. et al. Bayesian genome-wide TWAS method to leverage both cis- and trans-eQTL information through summary statistics. Am. J. Hum. Genet. 107, 714–726 (2020).
Bhattacharya, A., Li, Y. & Love, M. I. MOSTWAS: multi-omic strategies for transcriptome-wide association studies. PLoS Genet. 17, e1009398 (2021).
Zhang, W. et al. Integrative transcriptome imputation reveals tissue-specific and shared biological mechanisms mediating susceptibility to complex traits. Nat. Commun. 10, 3834 (2019).
Cao, C. et al. kTWAS: integrating kernel machine with transcriptome-wide association studies improves statistical power and reveals novel genes. Brief. Bioinform. 22, bbaa270 (2021).
Tang, S. et al. Novel variance-component TWAS method for studying complex human diseases with applications to Alzheimer’s dementia. PLoS Genet. 17, e1009482 (2021).
Zeng, P., Dai, J., Jin, S. & Zhou, X. Aggregating multiple expression prediction models improves the power of transcriptome-wide association studies. Hum. Mol. Genet. 30, 939–951 (2021).
Zuber, V., Colijn, J. M., Klaver, C. & Burgess, S. Selecting likely causal risk factors from high-throughput experiments using multivariable Mendelian randomization. Nat. Commun. 11, 29 (2020).
Barbeira, A. N. et al. Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet. 15, e1007889 (2019).
Mancuso, N. et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nat. Genet. 51, 675 (2019).
Wu, C. & Pan, W. A powerful fine-mapping method for transcriptome-wide association studies. Hum. Genet. 139, 199–213 (2020).
Knutson, K. A., Deng, Y. & Pan, W. Implicating causal brain imaging endophenotypes in Alzheimer’s disease using multivariable IWAS and GWAS summary data. NeuroImage 223, 117347 (2020).
Morley, M. et al. Genetic analysis of genome-wide variation in human gene expression. Nature 430, 743–747 (2004).
Klebanov, L. & Yakovlev, A. Diverse correlation structures in gene expression data and their utility in improving statistical inference. Ann. Appl. Stat. 1, 538–559 (2007).
Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).
Rust, S. et al. Tangier disease is caused by mutations in the gene encoding ATP-binding cassette transporter 1. Nat. Genet. 22, 352–355 (1999).
Buniello, A. et al. The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Frikke-Schmidt, R. et al. Association of loss-of-function mutations in the ABCA1 gene with high-density lipoprotein cholesterol levels and risk of ischemic heart disease. JAMA 299, 2524–2532 (2008).
McNeish, J. et al. High density lipoprotein deficiency and foam cell accumulation in mice with targeted disruption of ATP-binding cassette transporter-1. Proc. Natl Acad. Sci. USA 97, 4245–4250 (2000).
Brunham, L. R. et al. Intestinal ABCA1 directly contributes to HDL biogenesis in vivo. J. Clin. Invest. 116, 1052–1062 (2006).
Simons, K. & Ikonen, E. Functional rafts in cell membranes. Nature 387, 569–572 (1997).
Hao, X., Zeng, P., Zhang, S. & Zhou, X. Identifying and exploiting trait-relevant tissues with multiple functional annotations in genome-wide association studies. PLoS Genet. 14, e1007186 (2018).
Shang, L., Smith, J. A. & Zhou, X. Leveraging gene co-expression patterns to infer trait-relevant tissues in genome-wide association studies. PLoS Genet. 16, e1008734 (2020).
Li, Z. et al. METRO: multi-ancestry transcriptome-wide association studies for powerful gene-trait association detection. Am. J. Hum. Genet. 109, 783–801 (2022).
Knutson, K. A. & Pan, W. MATS: a novel multi-ancestry transcriptome-wide association study to account for heterogeneity in the effects of cis-regulated gene expression on complex traits. Hum. Mol. Genet. 32, 1237–1251 (2023).
Lu, Z. et al. Multi-ancestry fine-mapping improves precision to identify causal genes in transcriptome-wide association studies. Am. J. Hum. Genet. 109, 1388–1404 (2022).
Liu, L. et al. GIFT: conditional transcriptome-wide association study for fine-mapping candidate causal genes. Zenodo https://doi.org/10.5281/zenodo.10070491 (2023).
Ray, D. & Boehnke, M. Methods for meta-analysis of multiple traits using GWAS summary statistics. Genet. Epidemiol. 42, 134–145 (2018).
Zhu, X. et al. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. Am. J. Hum. Genet. 96, 21–36 (2015).
Kim, J., Bai, Y. & Pan, W. An adaptive association test for multiple phenotypes with GWAS summary statistics. Genet. Epidemiol. 39, 651–663 (2015).
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Wen, X., Luca, F. & Pique-Regi, R. Cross-population joint analysis of eQTLs: fine mapping and functional annotation. PLoS Genet. 11, e1005176 (2015).
Efron, B. Size, power and false discovery rates. Ann. Appl. Stat. 35, 1351–1377 (2007).
Yu, G., Wang, L. G., Han, Y. & He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).
Acknowledgements
The present study was supported by the National Natural Science Foundation of China (grant nos. 82373686 and 82173624), the Natural Science Foundation of Shandong Province (grant no. ZR2019ZD02), the Taishan Scholar Project of Shandong Province (grant no. tsqn202211025) and the Cheeloo Young Talent Program of Shandong University, all awarded to Z.Y. X.Z. is supported by the University of Michigan, Ann Arbor, MI, USA. The present study has been conducted using the UK Biobank resource under application no. 30686. The UK Biobank was established by the Wellcome Trust medical charity, Medical Research Council, Department of Health, Scottish Government and the Northwest Regional Development Agency. It has also received funding from the Welsh Assembly Government, British Heart Foundation and Diabetes UK.
Author information
Authors and Affiliations
Contributions
X.Z. and Z.Y. conceived the idea. L.L. developed the methods. L.L. developed the software tool with assistance from R.Y. L.L. performed simulations and real-data analysis with assistance from P.G., J.J., W.G., F.X. and X.Z. X.Z., Z.Y. and L.L. wrote the manuscript with input from all the other authors. All authors reviewed and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Arjun Bhattacharya and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Simulation setups in the present study.
Information include the simulation workflow, different simulation scenarios as well as the corresponding parameter settings.
Extended Data Fig. 2 Quantile-quantile plots of -log10 p-values for testing the non-causal genes in the complete null-simulation settings.
Compared methods include GIFT (orange), FOGS (purple), and MV-IWAS (blue). (a) Sparse simulation settings, where either one SNP (circle), three SNPs (square), or five SNPs (triangle) have non-zero effects on gene expression. (b) Polygenic simulation settings, where either 1% of SNPs (circle), or 10% of SNPs (square) have non-zero effects on gene expression. (c) Homogeneous expression heritability settings, where the expression heritability PVEzx is either 1% (circle), 5% (square), 10% (triangle) or 20% (diamond). (d) Heterogeneous expression heritability simulation settings, where the expression heritability of each gene in the region is randomly set to be 1%, 5%, 10% or 20%. (e) Simulation settings with varying sample sizes: circle represents the settings where the sample size of the gene expression study is 250 and the sample size of GWAS is 5,000; while square represents the settings where the sample size of the gene expression study is 465 and the sample size of GWAS is 50,000. (f) Simulation settings where the expression correlation of multiple genes ρ is set to be either 0 (circle), 0.3 (square), 0.6 (triangle), or 0.9 (diamond). Two-sided p-values are calculated for all methods.
Extended Data Fig. 3 Quantile-quantile plots of -log10 p-values for testing the non-causal genes in the more challenging null-simulation settings.
Quantile-quantile plots of -log10 p-values from different methods for the non-causal genes in the simulations, where the gene expression heritability PVEzx is set to be either 1% (magenta), 5% (green), 10% (orange), 20% (blue) or randomly set to be these four values (purple). Compared methods include GIFT (a,b), FOGS (c,d), and MV-IWAS (e,f). Simulations are performed in settings where the number of causal genes in the region is set to be either one (a,c,e) or two (b,d,f). Two-sided p-values are calculated for all methods.
Extended Data Fig. 4 Power comparisons for different methods based on a true FDR of 0.05 under different simulation settings.
The number of causal genes in the region is set to be either one (a,c,e,g) or two (b,d,f,h). Compared methods include GIFT (orange), FOCUS (green), FOGS (purple), and MV-IWAS (blue). Simulations are performed with different gene expression heritability (a,b), different gene expression correlations (c,d), the sparse simulation settings (e,f) or the polygenic simulation settings (g,h).
Extended Data Fig. 5 TWAS fine mapping results from different methods across six traits in UK Biobank.
(a) Ring plot shows the proportion of analyzed regions across six traits that harbor either 1 (blue); 2 ~ 5 (red); 6 ~ 10 (peachpuff); or >10 (yellow) genes that have significant marginal TWAS evidence. (b) The 90%-credible sets from FOCUS across six traits are divided into two categories: those that include only genes (blue) and those that include both genes and the null model (red). (c) Summary plot shows the proportion of regions that harbor 1, 2, 3, or more genes detected by different methods across six traits. Four-way Venn diagram of genes identified by GIFT, FOCUS, FOGS, and MV-IWAS for SBP (d), DBP (e), TC (f), HDL (g), LDL (h) and TG (i).
Extended Data Fig. 6 Manhattan plots of the fine-mapping results from GIFT for three blood lipid traits.
Different colors represent different levels of evidence: a gene is ‘Known’ (red) if its association with the trait has been previously reported and well documented; a gene is a significant ‘TWAS’ gene (blue) if its marginal TWAS p-value is below the Bonferroni corrected transcriptome-wide threshold; a gene is a significant ‘GWAS’ gene (purple) if its marginal GWAS p-value is below the usual genome-wide threshold 5 × 10−8 or such association was previously reported; otherwise, a gene is denoted as ‘NA’ (brown). Two-sided p-values are calculated for all methods. Manhattan plots are displayed for TC(a), LDL(b) and TG(c).
Extended Data Fig. 7 TWAS fine mapping results from different methods for two binary traits in the UK Biobank.
(a) Quantile–quantile plot of -log10 p-values from the three frequentists methods, including GIFT (orange), FOGS (purple), and MV-IWAS (blue), for testing gene associations with cardiovascular disease (CVD; circle) and obesity (diamond). (b) Ring plot shows the proportion of analyzed regions that harbor either 1 (blue); 2 ~ 5 (red); 6 ~ 10 (peachpuff); or >10 (yellow) genes that have significant marginal TWAS association evidence. (c) The 90%-credible sets from FOCUS are divided into two categories: those that include only genes (blue) and those that include both genes and the null model (red). (d) Summary plot shows the proportion of regions that harbor 1, 2, 3, or more associated genes detected by different methods. (e) Manhattan plots show the fine-mapping results from GIFT for CVD. (f) Manhattan plots show the fine-mapping results from GIFT for obesity. For (e) and (f), different colors represent different levels of evidence: a gene is ‘Known’ (red) if its association with the trait has been previously reported and well documented; a gene is a significant ‘TWAS’ gene (blue) if its marginal TWAS p-value is below the Bonferroni corrected transcriptome-wide threshold; a gene is a significant ‘GWAS’ gene (purple) if its marginal GWAS p-value is below the usual genome-wide threshold 5 × 10−8 or such association was previously reported; otherwise, a gene is denoted as ‘NA’ (brown). Two-sided p-values are calculated for all frequentist methods.
Supplementary information
Supplementary Information
Supplementary Notes 1–14, Figs. 1–52, Tables 1–9 and References.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, L., Yan, R., Guo, P. et al. Conditional transcriptome-wide association study for fine-mapping candidate causal genes. Nat Genet 56, 348–356 (2024). https://doi.org/10.1038/s41588-023-01645-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-023-01645-y