Conditional transcriptome-wide association study for fine-mapping candidate causal genes

Liu, Lu; Yan, Ran; Guo, Ping; Ji, Jiadong; Gong, Weiming; Xue, Fuzhong; Yuan, Zhongshang; Zhou, Xiang

doi:10.1038/s41588-023-01645-y

Technical Report
Published: 26 January 2024

Conditional transcriptome-wide association study for fine-mapping candidate causal genes

Lu Liu^1,2,
Ran Yan^1,2,
Ping Guo^1,2,
Jiadong Ji³,
Weiming Gong^1,2,
Fuzhong Xue ORCID: orcid.org/0000-0003-0378-7956^1,2,
Zhongshang Yuan ORCID: orcid.org/0000-0002-3527-4488^1,2 &
…
Xiang Zhou ORCID: orcid.org/0000-0002-4331-7599^4,5

Nature Genetics volume 56, pages 348–356 (2024)Cite this article

6191 Accesses
8 Altmetric
Metrics details

Subjects

Abstract

Transcriptome-wide association studies (TWASs) aim to integrate genome-wide association studies with expression-mapping studies to identify genes with genetically predicted expression (GReX) associated with a complex trait. In the present report, we develop a method, GIFT (gene-based integrative fine-mapping through conditional TWAS), that performs conditional TWAS analysis by explicitly controlling for GReX of all other genes residing in a local region to fine-map putatively causal genes. GIFT is frequentist in nature, explicitly models both expression correlation and cis-single nucleotide polymorphism linkage disequilibrium across multiple genes and uses a likelihood framework to account for expression prediction uncertainty. As a result, GIFT produces calibrated P values and is effective for fine-mapping. We apply GIFT to analyze six traits in the UK Biobank, where GIFT narrows down the set size of putatively causal genes by 32.16–91.32% compared with existing TWAS fine-mapping approaches. The genes identified by GIFT highlight the importance of vessel regulation in determining blood pressures and lipid metabolism for regulating lipid levels.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Schematic overview of GIFT for conditional TWAS analysis.**

**Fig. 2: Comparison of GIFT with other TWAS fine-mapping methods in simulations.**

**Fig. 3: TWAS fine-mapping results from different methods for the two BP traits in the UK Biobank.**

**Fig. 4: TWAS fine-mapping results from different methods for the four blood lipid traits in the UK Biobank.**

Adjusting for genetic confounders in transcriptome-wide association studies improves discovery of risk genes of complex traits

Article Open access 26 January 2024

SUMMIT: An integrative approach for better transcriptomic data imputation improves causal gene identification

Article Open access 25 October 2022

Transcriptome-wide association studies: recent advances in methods, applications and available databases

Article Open access 01 September 2023

Data availability

No data were generated in the present study. The GEUVADIS gene expression data are publicly available at https://www.internationalgenome.org/data-portal/data-collection/geuvadis. The UK Biobank data were obtained from the UK Biobank resource (http://www.ukbiobank.ac.uk). The 1000 Genomes project data (phase 3) are available at https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502. GENCODE (release 12) is available at https://www.gencodegenes.org/human/release_12.html.

Code availability

GIFT is implemented in the R package GIFT, freely available at https://yuanzhongshang.github.io/GIFT and https://zenodo.org/records/10070491. The analysis code used to reproduce all results is available at https://zenodo.org/records/10070491. We also performed analysis using FOCUS (v.0.802, https://github.com/mancusolab/ma-focus), FOGS (v.2.0, https://github.com/ChongWu-Biostat/FOGS), MV-IWAS (v.0.0.0.9000, https://github.com/kathalexknuts/MVIWAS), clusterProfiler (v.3.14.3, https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) and PLINK (v.1.90b6.13, https://www.cog-genomics.org/plink/1.9).

References

Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).
Article CAS PubMed PubMed Central Google Scholar
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Article CAS PubMed PubMed Central Google Scholar
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yuan, Z. et al. Testing and controlling for horizontal pleiotropy with probabilistic Mendelian randomization in transcriptome-wide association studies. Nat. Commun. 11, 3861 (2020).
Article PubMed PubMed Central ADS Google Scholar
Liu, L., Zeng, P., Xue, F., Yuan, Z. & Zhou, X. Multi-trait transcriptome-wide association studies with probabilistic Mendelian randomization. Am. J. Hum. Genet. 108, 240–256 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yang, C. et al. CoMM: a collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information. Bioinformatics 35, 1644–1652 (2019).
Article CAS PubMed Google Scholar
Zeng, P. & Zhou, X. Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models. Nat. Commun. 8, 456 (2017).
Article PubMed PubMed Central ADS Google Scholar
Nagpal, S. et al. TIGAR: an improved Bayesian tool for transcriptomic data imputation enhances gene mapping of complex traits. Am. J. Hum. Genet. 105, 258–266 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. PTWAS: investigating tissue-relevant causal molecular mechanisms of complex traits using probabilistic TWAS analysis. Genome Biol. 21, 232 (2020).
Article PubMed PubMed Central Google Scholar
Luningham, J. M. et al. Bayesian genome-wide TWAS method to leverage both cis- and trans-eQTL information through summary statistics. Am. J. Hum. Genet. 107, 714–726 (2020).
Article CAS PubMed PubMed Central Google Scholar
Bhattacharya, A., Li, Y. & Love, M. I. MOSTWAS: multi-omic strategies for transcriptome-wide association studies. PLoS Genet. 17, e1009398 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhang, W. et al. Integrative transcriptome imputation reveals tissue-specific and shared biological mechanisms mediating susceptibility to complex traits. Nat. Commun. 10, 3834 (2019).
Article PubMed PubMed Central ADS Google Scholar
Cao, C. et al. kTWAS: integrating kernel machine with transcriptome-wide association studies improves statistical power and reveals novel genes. Brief. Bioinform. 22, bbaa270 (2021).
Article PubMed Google Scholar
Tang, S. et al. Novel variance-component TWAS method for studying complex human diseases with applications to Alzheimer’s dementia. PLoS Genet. 17, e1009482 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zeng, P., Dai, J., Jin, S. & Zhou, X. Aggregating multiple expression prediction models improves the power of transcriptome-wide association studies. Hum. Mol. Genet. 30, 939–951 (2021).
Article CAS PubMed Google Scholar
Zuber, V., Colijn, J. M., Klaver, C. & Burgess, S. Selecting likely causal risk factors from high-throughput experiments using multivariable Mendelian randomization. Nat. Commun. 11, 29 (2020).
Article CAS PubMed PubMed Central ADS Google Scholar
Barbeira, A. N. et al. Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet. 15, e1007889 (2019).
Article PubMed PubMed Central Google Scholar
Mancuso, N. et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nat. Genet. 51, 675 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wu, C. & Pan, W. A powerful fine-mapping method for transcriptome-wide association studies. Hum. Genet. 139, 199–213 (2020).
Article CAS PubMed Google Scholar
Knutson, K. A., Deng, Y. & Pan, W. Implicating causal brain imaging endophenotypes in Alzheimer’s disease using multivariable IWAS and GWAS summary data. NeuroImage 223, 117347 (2020).
Article PubMed Google Scholar
Morley, M. et al. Genetic analysis of genome-wide variation in human gene expression. Nature 430, 743–747 (2004).
Article CAS PubMed PubMed Central ADS Google Scholar
Klebanov, L. & Yakovlev, A. Diverse correlation structures in gene expression data and their utility in improving statistical inference. Ann. Appl. Stat. 1, 538–559 (2007).
Article MathSciNet Google Scholar
Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).
Article CAS PubMed Google Scholar
Rust, S. et al. Tangier disease is caused by mutations in the gene encoding ATP-binding cassette transporter 1. Nat. Genet. 22, 352–355 (1999).
Article CAS PubMed Google Scholar
Buniello, A. et al. The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Article CAS PubMed Google Scholar
Frikke-Schmidt, R. et al. Association of loss-of-function mutations in the ABCA1 gene with high-density lipoprotein cholesterol levels and risk of ischemic heart disease. JAMA 299, 2524–2532 (2008).
Article CAS PubMed Google Scholar
McNeish, J. et al. High density lipoprotein deficiency and foam cell accumulation in mice with targeted disruption of ATP-binding cassette transporter-1. Proc. Natl Acad. Sci. USA 97, 4245–4250 (2000).
Article CAS PubMed PubMed Central ADS Google Scholar
Brunham, L. R. et al. Intestinal ABCA1 directly contributes to HDL biogenesis in vivo. J. Clin. Invest. 116, 1052–1062 (2006).
Article CAS PubMed PubMed Central Google Scholar
Simons, K. & Ikonen, E. Functional rafts in cell membranes. Nature 387, 569–572 (1997).
Article CAS PubMed ADS Google Scholar
Hao, X., Zeng, P., Zhang, S. & Zhou, X. Identifying and exploiting trait-relevant tissues with multiple functional annotations in genome-wide association studies. PLoS Genet. 14, e1007186 (2018).
Article PubMed PubMed Central Google Scholar
Shang, L., Smith, J. A. & Zhou, X. Leveraging gene co-expression patterns to infer trait-relevant tissues in genome-wide association studies. PLoS Genet. 16, e1008734 (2020).
Article CAS PubMed PubMed Central Google Scholar
Li, Z. et al. METRO: multi-ancestry transcriptome-wide association studies for powerful gene-trait association detection. Am. J. Hum. Genet. 109, 783–801 (2022).
Article CAS PubMed PubMed Central Google Scholar
Knutson, K. A. & Pan, W. MATS: a novel multi-ancestry transcriptome-wide association study to account for heterogeneity in the effects of cis-regulated gene expression on complex traits. Hum. Mol. Genet. 32, 1237–1251 (2023).
Article CAS PubMed Google Scholar
Lu, Z. et al. Multi-ancestry fine-mapping improves precision to identify causal genes in transcriptome-wide association studies. Am. J. Hum. Genet. 109, 1388–1404 (2022).
Article CAS PubMed PubMed Central Google Scholar
Liu, L. et al. GIFT: conditional transcriptome-wide association study for fine-mapping candidate causal genes. Zenodo https://doi.org/10.5281/zenodo.10070491 (2023).
Ray, D. & Boehnke, M. Methods for meta-analysis of multiple traits using GWAS summary statistics. Genet. Epidemiol. 42, 134–145 (2018).
Article PubMed Google Scholar
Zhu, X. et al. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. Am. J. Hum. Genet. 96, 21–36 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kim, J., Bai, Y. & Pan, W. An adaptive association test for multiple phenotypes with GWAS summary statistics. Genet. Epidemiol. 39, 651–663 (2015).
Article PubMed PubMed Central Google Scholar
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).
Article CAS PubMed PubMed Central Google Scholar
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Article CAS PubMed PubMed Central ADS Google Scholar
Wen, X., Luca, F. & Pique-Regi, R. Cross-population joint analysis of eQTLs: fine mapping and functional annotation. PLoS Genet. 11, e1005176 (2015).
Article PubMed PubMed Central Google Scholar
Efron, B. Size, power and false discovery rates. Ann. Appl. Stat. 35, 1351–1377 (2007).
MathSciNet Google Scholar
Yu, G., Wang, L. G., Han, Y. & He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The present study was supported by the National Natural Science Foundation of China (grant nos. 82373686 and 82173624), the Natural Science Foundation of Shandong Province (grant no. ZR2019ZD02), the Taishan Scholar Project of Shandong Province (grant no. tsqn202211025) and the Cheeloo Young Talent Program of Shandong University, all awarded to Z.Y. X.Z. is supported by the University of Michigan, Ann Arbor, MI, USA. The present study has been conducted using the UK Biobank resource under application no. 30686. The UK Biobank was established by the Wellcome Trust medical charity, Medical Research Council, Department of Health, Scottish Government and the Northwest Regional Development Agency. It has also received funding from the Welsh Assembly Government, British Heart Foundation and Diabetes UK.

Author information

Authors and Affiliations

Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
Lu Liu, Ran Yan, Ping Guo, Weiming Gong, Fuzhong Xue & Zhongshang Yuan
Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
Lu Liu, Ran Yan, Ping Guo, Weiming Gong, Fuzhong Xue & Zhongshang Yuan
Institute for Financial Studies, Shandong University, Jinan, China
Jiadong Ji
Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
Xiang Zhou
Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
Xiang Zhou

Authors

Lu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ran Yan
View author publications
You can also search for this author in PubMed Google Scholar
Ping Guo
View author publications
You can also search for this author in PubMed Google Scholar
Jiadong Ji
View author publications
You can also search for this author in PubMed Google Scholar
Weiming Gong
View author publications
You can also search for this author in PubMed Google Scholar
Fuzhong Xue
View author publications
You can also search for this author in PubMed Google Scholar
Zhongshang Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.Z. and Z.Y. conceived the idea. L.L. developed the methods. L.L. developed the software tool with assistance from R.Y. L.L. performed simulations and real-data analysis with assistance from P.G., J.J., W.G., F.X. and X.Z. X.Z., Z.Y. and L.L. wrote the manuscript with input from all the other authors. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Zhongshang Yuan or Xiang Zhou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Arjun Bhattacharya and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Simulation setups in the present study.

Information include the simulation workflow, different simulation scenarios as well as the corresponding parameter settings.

Extended Data Fig. 2 Quantile-quantile plots of -log10 p-values for testing the non-causal genes in the complete null-simulation settings.

Compared methods include GIFT (orange), FOGS (purple), and MV-IWAS (blue). (a) Sparse simulation settings, where either one SNP (circle), three SNPs (square), or five SNPs (triangle) have non-zero effects on gene expression. (b) Polygenic simulation settings, where either 1% of SNPs (circle), or 10% of SNPs (square) have non-zero effects on gene expression. (c) Homogeneous expression heritability settings, where the expression heritability PVE_zx is either 1% (circle), 5% (square), 10% (triangle) or 20% (diamond). (d) Heterogeneous expression heritability simulation settings, where the expression heritability of each gene in the region is randomly set to be 1%, 5%, 10% or 20%. (e) Simulation settings with varying sample sizes: circle represents the settings where the sample size of the gene expression study is 250 and the sample size of GWAS is 5,000; while square represents the settings where the sample size of the gene expression study is 465 and the sample size of GWAS is 50,000. (f) Simulation settings where the expression correlation of multiple genes ρ is set to be either 0 (circle), 0.3 (square), 0.6 (triangle), or 0.9 (diamond). Two-sided p-values are calculated for all methods.

Extended Data Fig. 3 Quantile-quantile plots of -log₁₀ p-values for testing the non-causal genes in the more challenging null-simulation settings.

Quantile-quantile plots of -log10 p-values from different methods for the non-causal genes in the simulations, where the gene expression heritability PVE_zx is set to be either 1% (magenta), 5% (green), 10% (orange), 20% (blue) or randomly set to be these four values (purple). Compared methods include GIFT (a,b), FOGS (c,d), and MV-IWAS (e,f). Simulations are performed in settings where the number of causal genes in the region is set to be either one (a,c,e) or two (b,d,f). Two-sided p-values are calculated for all methods.

Extended Data Fig. 4 Power comparisons for different methods based on a true FDR of 0.05 under different simulation settings.

The number of causal genes in the region is set to be either one (a,c,e,g) or two (b,d,f,h). Compared methods include GIFT (orange), FOCUS (green), FOGS (purple), and MV-IWAS (blue). Simulations are performed with different gene expression heritability (a,b), different gene expression correlations (c,d), the sparse simulation settings (e,f) or the polygenic simulation settings (g,h).

Extended Data Fig. 5 TWAS fine mapping results from different methods across six traits in UK Biobank.

(a) Ring plot shows the proportion of analyzed regions across six traits that harbor either 1 (blue); 2 ~ 5 (red); 6 ~ 10 (peachpuff); or >10 (yellow) genes that have significant marginal TWAS evidence. (b) The 90%-credible sets from FOCUS across six traits are divided into two categories: those that include only genes (blue) and those that include both genes and the null model (red). (c) Summary plot shows the proportion of regions that harbor 1, 2, 3, or more genes detected by different methods across six traits. Four-way Venn diagram of genes identified by GIFT, FOCUS, FOGS, and MV-IWAS for SBP (d), DBP (e), TC (f), HDL (g), LDL (h) and TG (i).

Extended Data Fig. 6 Manhattan plots of the fine-mapping results from GIFT for three blood lipid traits.

Different colors represent different levels of evidence: a gene is ‘Known’ (red) if its association with the trait has been previously reported and well documented; a gene is a significant ‘TWAS’ gene (blue) if its marginal TWAS p-value is below the Bonferroni corrected transcriptome-wide threshold; a gene is a significant ‘GWAS’ gene (purple) if its marginal GWAS p-value is below the usual genome-wide threshold 5 × 10⁻⁸ or such association was previously reported; otherwise, a gene is denoted as ‘NA’ (brown). Two-sided p-values are calculated for all methods. Manhattan plots are displayed for TC(a), LDL(b) and TG(c).

Extended Data Fig. 7 TWAS fine mapping results from different methods for two binary traits in the UK Biobank.

(a) Quantile–quantile plot of -log10 p-values from the three frequentists methods, including GIFT (orange), FOGS (purple), and MV-IWAS (blue), for testing gene associations with cardiovascular disease (CVD; circle) and obesity (diamond). (b) Ring plot shows the proportion of analyzed regions that harbor either 1 (blue); 2 ~ 5 (red); 6 ~ 10 (peachpuff); or >10 (yellow) genes that have significant marginal TWAS association evidence. (c) The 90%-credible sets from FOCUS are divided into two categories: those that include only genes (blue) and those that include both genes and the null model (red). (d) Summary plot shows the proportion of regions that harbor 1, 2, 3, or more associated genes detected by different methods. (e) Manhattan plots show the fine-mapping results from GIFT for CVD. (f) Manhattan plots show the fine-mapping results from GIFT for obesity. For (e) and (f), different colors represent different levels of evidence: a gene is ‘Known’ (red) if its association with the trait has been previously reported and well documented; a gene is a significant ‘TWAS’ gene (blue) if its marginal TWAS p-value is below the Bonferroni corrected transcriptome-wide threshold; a gene is a significant ‘GWAS’ gene (purple) if its marginal GWAS p-value is below the usual genome-wide threshold 5 × 10⁻⁸ or such association was previously reported; otherwise, a gene is denoted as ‘NA’ (brown). Two-sided p-values are calculated for all frequentist methods.

Supplementary information

Supplementary Information

Supplementary Notes 1–14, Figs. 1–52, Tables 1–9 and References.

Reporting Summary

Peer Review File

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, L., Yan, R., Guo, P. et al. Conditional transcriptome-wide association study for fine-mapping candidate causal genes. Nat Genet 56, 348–356 (2024). https://doi.org/10.1038/s41588-023-01645-y

Download citation

Received: 18 January 2023
Accepted: 08 December 2023
Published: 26 January 2024
Issue Date: February 2024
DOI: https://doi.org/10.1038/s41588-023-01645-y