Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

A combined polygenic score of 21,293 rare and 22 common variants improves diabetes diagnosis based on hemoglobin A1C levels

Abstract

Polygenic scores (PGSs) combine the effects of common genetic variants1,2 to predict risk or treatment strategies for complex diseases3,4,5,6,7. Adding rare variation to PGSs has largely unknown benefits and is methodically challenging. Here, we developed a method for constructing rare variant PGSs and applied it to calculate genetically modified hemoglobin A1C thresholds for type 2 diabetes (T2D) diagnosis7,8,9,10. The resultant rare variant PGS is highly polygenic (21,293 variants across 154 genes), depends on ultra-rare variants (72.7% observed in fewer than three people) and identifies significantly more undiagnosed T2D cases than expected by chance (odds ratio = 2.71; P = 1.51 × 10−6). A PGS combining common and rare variants is expected to identify 4.9 million misdiagnosed T2D cases in the United States—nearly 1.5-fold more than the common variant PGS alone. These results provide a method for constructing complex trait PGSs from rare variants and suggest that rare variants will augment common variants in precision medicine approaches for common disease.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Rare variant associations for HbA1C are comparatively strong.
Fig. 2: Rare variant gene-level HbA1C associations show enrichment for genes involved in glycemic control and erythrocytic pathways in mice.
Fig. 3: Accuracy and properties of rare and common variant PGSs.

Similar content being viewed by others

Data availability

Sequence data and phenotypes from the AMP-T2D-GENES study are available via the database of Genotypes and Phenotypes and/or European Genome-Phenome Archive, as indicated in Supplementary Table 14. Access to data from the UKB can be obtained at https://www.ukbiobank.ac.uk/enable-your-research.

Code availability

All of the software used in the analysis was open source and is described in the Methods. This includes BWA version 0.7, Picard version 1, GATK version 3.4, EPACTS version 3.2.4, Plink 1.9, the METAL meta-analysis tool, the Michigan Imputation Server, KlustaKwik version 1.5, FlashPCA2 version 2.0, HAIL version 0.2.61, KING version 2.1.2, VEP version 87 and Tabix version 0.2.6. Previously published scripts (https://www.nature.com/articles/s41586-019-1231-2.pdf) were used to conduct the minimum P value test and to perform the Wilcoxon rank-sum test for gene sets. A custom script was written to calculate common and rare variant PGSs (available via GitHub at https://github.com/PeterDornbos/dornbos_ps). The Type 2 Diabetes Knowledge Portal was used to scan for GWAS loci (https://t2d.hugeamp.org/).

References

  1. Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Choi, S. W., Mak, T. S. & O’Reilly, P. F. Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc. 15, 2759–2772 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Khera, A. V. et al. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell 177, 587–596 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Mavaddat, N. et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am. J. Hum. Genet. 104, 21–34 (2019).

    Article  CAS  PubMed  Google Scholar 

  6. Mahajan, A. et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat. Genet. 50, 1505–1513 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Wheeler, E. et al. Impact of common genetic determinants of hemoglobin A1C on type 2 diabetes risk and diagnosis in ancestrally diverse populations: a transethnic genome-wide meta-analysis. PLoS Med. 14, e1002383 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Leong, A. & Meigs, J. B. Type 2 diabetes prevention: implications of hemoglobin A1C genetics. Rev. Diabet. Stud. 12, 351–362 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Sarnowski, C. et al. Impact of rare and common genetic variants on diabetes diagnosis by hemoglobin A1C in multi-ancestry cohorts: the Trans-Omics for Precision Medicine Program. Am. J. Hum. Genet. 105, 706–718 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Soranzo, N. et al. Common variants at 10 genomic loci influence hemoglobin A1C levels via glycemic and nonglycemic pathways. Diabetes 59, 3229–3239 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Higgins, P. J. & Bunn, H. F. Kinetic analysis of the nonenzymatic glycosylation of hemoglobin. J. Biol. Chem. 256, 5204–5208 (1981).

    Article  CAS  PubMed  Google Scholar 

  12. American Diabetes Association. 2. Classification and diagnosis of diabetes: Standards of Medical Care in Diabetes—2021. Diabetes Care 44, S15–S33 (2021).

    Article  Google Scholar 

  13. American Diabetes Association. 3. Prevention or delay of type 2 diabetes: Standards of Medical Care in Diabetes—2021. Diabetes Care 44, S34–S39 (2021).

    Article  Google Scholar 

  14. American Diabetes Association. 6. Glycemic targets: Standards of Medical Care in Diabetes—2021. Diabetes Care 44, S73–S84 (2021).

    Article  Google Scholar 

  15. American Diabetes Association. 9. Pharmacologic approaches to glycemic treatment: Standards of Medical Care in Diabetes—2021. Diabetes Care 44, S111–S124 (2021).

    Article  Google Scholar 

  16. Cohen, R. M. et al. Red cell life span heterogeneity in hematologically normal people is sufficient to alter HbA1c. Blood 112, 4284–4291 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Chai, J. F. et al. Genome-wide association for HbA1c in Malay identified deletion on SLC4A1 that influences HbA1c independent of glycemia. J. Clin. Endocrinol. Metab. 105, dgaa658 (2020).

    Article  PubMed  Google Scholar 

  18. Chen, P. et al. Multiple nonglycemic genomic loci are newly associated with blood level of glycated hemoglobin in East Asians. Diabetes 63, 2551–2562 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Chen, J. et al. The trans-ancestral genomic architecture of glycemic traits. Nat. Genet. 53, 840–860 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).

    Article  CAS  PubMed  Google Scholar 

  21. Goodrich, J. K. et al. Determinants of penetrance and variable expressivity in monogenic metabolic conditions across 77,184 exomes. Nat. Commun. 12, 3505 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011).

    Article  CAS  PubMed  Google Scholar 

  23. Cirulli, E. T. & Goldstein, D. B. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat. Rev. Genet. 11, 415–425 (2010).

    Article  CAS  PubMed  Google Scholar 

  24. Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95, 5–23 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Van Hout, C. V. et al. Exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank. Nature 586, 749–756 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Flannick, J. et al. Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls. Nature 570, 71–76 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Luzzatto, L., Nannelli, C. & Notaro, R. Glucose-6-phosphate dehydrogenase deficiency. Hematol. Oncol. Clin. North Am. 30, 373–393 (2016).

    Article  PubMed  Google Scholar 

  29. Pandolfi, P. P. et al. Targeted disruption of the housekeeping gene encoding glucose 6-phosphate dehydrogenase (G6PD): G6PD is dispensable for pentose synthesis but essential for defense against oxidative stress. EMBO J. 14, 5209–5215 (1995).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Cahalan, S. M. et al. Piezo1 links mechanical forces to red blood cell volume. eLife 4, e07370 (2015).

    Article  PubMed Central  Google Scholar 

  31. Faucherre, A., Kissa, K., Nargeot, J., Mangoni, M. E. & Jopling, C. Piezo1 plays a role in erythrocyte volume homeostasis. Haematologica 99, 70–75 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Beutler, E. G6PD deficiency. Blood 84, 3613–3636 (1994).

    Article  CAS  PubMed  Google Scholar 

  33. Cappellini, M. D. & Fiorelli, G. Glucose-6-phosphate dehydrogenase deficiency. Lancet 371, 64–74 (2008).

    Article  CAS  PubMed  Google Scholar 

  34. Crouch, D. J. M. & Bodmer, W. F. Polygenic inheritance, GWAS, polygenic risk scores, and the search for functional variants. Proc. Natl Acad. Sci. USA 117, 18924–18933 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Dudbridge, F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 9, e1003348 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Smith, C. L. & Eppig, J. T. The mammalian phenotype ontology: enabling robust annotation and comparative analysis. Wiley Interdiscip. Rev. Syst. Biol. Med. 1, 390–399 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Mi, H., Muruganujan, A., Ebert, D., Huang, X. & Thomas, P. D. PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res. 47, D419–D426 (2019).

    Article  CAS  PubMed  Google Scholar 

  38. Conesa, A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005).

    Article  CAS  PubMed  Google Scholar 

  39. The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).

    Article  Google Scholar 

  40. Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353–D361 (2017).

    Article  CAS  PubMed  Google Scholar 

  41. Altshuler, D., Daly, M. J. & Lander, E. S. Genetic mapping in human disease. Science 322, 881–888 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Mahajan, A. et al. Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes. Nat. Genet. 50, 559–571 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Lupski, J. R., Belmont, J. W., Boerwinkle, E. & Gibbs, R. A. Clan genomics and the complex architecture of human disease. Cell 147, 32–43 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Estrada, K. et al. Association of a low-frequency variant in HNF1A with type 2 diabetes in a Latino population. J. Am. Med. Assoc. 311, 2305–2314 (2014).

    Article  Google Scholar 

  46. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Claussnitzer, M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Bodmer, W. & Bonilla, C. Common and rare variants in multifactorial susceptibility to common diseases. Nat. Genet. 40, 695–701 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Sveinbjornsson, G. et al. Weighting sequence variants based on their annotation increases power of whole-genome association studies. Nat. Genet. 48, 314–317 (2016).

    Article  CAS  PubMed  Google Scholar 

  51. Lohmueller, K. E. et al. Whole-exome sequencing of 2,000 Danish individuals and the role of rare coding variants in type 2 diabetes. Am. J. Hum. Genet. 93, 1072–1086 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Williams, A. L. et al. Sequence variants in SLC16A11 are a common risk factor for type 2 diabetes in Mexico. Nature 506, 97–101 (2014).

    Article  CAS  PubMed  Google Scholar 

  53. Eastwood, S. V. et al. Algorithms for the capture and adjudication of prevalent and incident diabetes in UK Biobank. PLoS ONE 11, e0162388 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  54. Hindy, G. et al. Rare coding variants in 35 genes associate with circulating lipid levels—a multi-ancestry analysis of 170,000 exomes. Am. J. Hum. Genet. 109, 81–96 (2020).

    Article  Google Scholar 

  55. McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  57. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Liu, X., Wu, C., Li, C. & Boerwinkle, E. dbNSFP v3.0: a one-stop database of functional predictions and annotations for human nonsynonymous and splice-site SNVs. Hum. Mutat. 37, 235–241 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Aken, B. L. et al. Ensembl 2017. Nucleic Acids Res. 45, D635–D642 (2017).

    Article  CAS  PubMed  Google Scholar 

  60. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Shim, H. et al. A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PLoS ONE 10, e0120758 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57, 289–300 (1995).

    Google Scholar 

  63. Lumley, T. Survey: Analysis of complex survey samples. R package version 4.0 (2020).

Download references

Acknowledgements

This project was supported by R01DK125490 and UM1DK105554. J.B.C. is supported by a National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Pathway to Independence Award (K99DK127196). A.L. was supported by Grant 2020096 from the Doris Duke Charitable Foundation. J.B.M. was supported by National Institutes of Health grants R01DK078616 and R01HL151855. J.C.F. was supported by National Heart, Lung, and Blood Institute (NHLBI) award K24 HL157960. J.I.R. was supported by National Center for Advancing Translational Sciences, Clinical and Translational Science Institute grant UL1TR001881 and NIDDK Diabetes Research Center grant DK063491 to the Southern California Diabetes Endocrinology Research Center. Infrastructure for the CHARGE Consortium is supported in part by NHLBI grant R01HL105756 and in part by the National Institutes of Health, NHLBI contract 1R01HL151855 and NIDDK contract UM1DK078616. M.S.U. was supported by grant K23DK114551.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

P.D. and J.F. conceived of the methodology and wrote the manuscript. P.D., R.K., A.R. and J.F. performed the analysis. T.N., J.B.C., A.L., J.B.M., J.C.F., J.I.R., M.S.U. and members of the AMP-T2D-GENES Consortium provided expertise and feedback on the manuscript. J.F. supervised the study.

Corresponding author

Correspondence to Jason Flannick.

Ethics declarations

Competing interests

As of April 2022, P.D. is an employee and stockholder of Regeneron Pharmaceuticals. J.B.M. is an academic associate for Quest Diagnostics endocrinology research and development. J.C.F. has received consulting honoraria from Novo Nordisk and AstraZeneca and speaker fees from Merck, Novo Nordisk and AstraZeneca for research lectures for which he had full control of the content. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Timothy Frayling and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Single variant HbA1C associations.

Manhattan plot of the single variant associations identified by our meta-analysis. Horizontal lines indicate the exome-wide significance threshold used for nonsynonymous variants (P ≤ 1.8 × 10-8 which was derived from a previous determined threshold50 (P ≤ 4.3 × 10-7) followed by Bonferroni correction for 24 phenotypes) and genome-wide significance for non-coding variants (P ≤ 2.1 × 10-9 which was derived from the traditional genome-wide significance threshold (P ≤ 5 × 10-8) followed by Bonferroni correction for 24 phenotypes). Single variant associations were determined via the efficient mixed-model association expedited (EMMAX) method49.

Extended Data Fig. 2 Effect sizes and proportion of variance explained for rare variant HbA1C gene-level associations.

a-c, Results are displayed for G6PD (a) (n = 1,382 for AA; n = 1,930 for EA; n = 41,689 for EU; n = 1,861 for SA; n = 892 for HS), GCK (b) (n = 551 for EA; n = 40,241 for EU; n = 487 for HS), and PIEZO1 (c) (n = 905 for AA; n = 1,340 for EA; n = 42,061 for EU; n = 789 for SA; n = 484 for HS). We calculated effect sizes (mmol/mol) and liability variance explained separately for each ancestry and then combined these via a meta-analysis. We performed the calculations for the strongest associated gene-level mask and for the strongest associated common variant within 125 kb of the gene as previously reported7 (n = 7,564 for AA; n = 20,838 for EA; n = 123,665 for EU; n = 8,874 for SA). Proportion of variance explained is displayed as the proportion of total liability variance. AA, African-American; EA, East Asian; EU, European; HS, Hispanic; SA, South Asian; M-A, meta-analysis. Error bars indicate 95% confidence intervals.

Extended Data Fig. 3 Calculating and evaluating common variant polygenic scores.

We calculated common polygenic scores based on effect sizes and results from a previously published multi-ancestry HbA1C GWAS7. We calculated polygenic scores separately for each of the four ancestries in our test sample with available GWAS data, evaluated ancestry-specific odds ratios via a Fisher’s exact tests, and then combined these odds ratios via a fixed-effects meta-analysis to produce trans-ancestry odds ratios.

Extended Data Fig. 4 Enrichment analyses of HbA1C and RBC rare variant gene-level associations.

We ranked genes by their HbA1C gene-level P-value and tested the degree to which the top n associations (with n ranging from 1 to 1,000) were enriched for red blood cell count (RBC) gene-level associations. Enrichments were calculated using a one-sided Wilcoxon rank-sum test, comparing the RBC gene-level P-values of the top n HbA1C associations to the RBC gene-level P-values of background genes matched on the number of variants and total allele count. The solid blue line in the plot shows the one-sided Wilcoxon P-values as a function of n. As a negative control, we also conducted the reciprocal analysis in which we tested the top i RBC associations for enrichment for HbA1C associations; the solid yellow indicates the one-sided Wilcoxon P-values.

Extended Data Fig. 5 A framework for constructing polygenic scores that include rare variants.

a,b, The framework consists of two steps: (a) choosing genes to include in the polygenic score, based on their association P-value and annotation, and (b) defining weights for rare variants, based on the masks that include them and the aggregate effect sizes observed for the masks. a, We explored three methods for choosing genes, based on their strength of HbA1C association (blue boxes) and evidence of acting through erythrocytic pathways (red). “GLYCEMIC set” indicates genes located within a glycemic gene set enriched (at P ≤ 0.05) for HbA1C rare variant associations, while “RBC set” indicates genes located within an erythrocytic gene set enriched (at P ≤ 0.05) for HbA1C rare variant associations (the specific gene sets are shown in Fig. 2). “HbA1C LOCUS” and “RBC LOCUS” indicates genes located within 125 kb of a common variant HbA1C or RBC association, respectively. The two negative controls included only genes that failed the erythrocytic pathway filters (“Excluded”) and applied either the HbA1C association strength filters for the loose gene set (control 1) or the association strength filters for the relaxed gene set (control 2). b, We explored three methods for weighting variants: the aggregate effect size of the strictest mask that contained the variant (nested), the aggregate effect size of variants unique to the strictest mask that contained the variant (unique), or the aggregate effect size of a weighted burden test for the gene (weighted).

Extended Data Fig. 6 Testing the accuracy of the rare variant polygenic score.

To test the rare variant polygenic score models for accuracy, we assessed the number of T2D cases with adjusted HbA1C that cross the diagnostic threshold for T2D (47.53 mmol/mol HbA1C) and, with a Fisher’s exact test, compared this to the number expected by chance. For each of the nine rare variant polygenic scores (three variant weighting schemes for each of three gene set definitions; Extended Data Fig. 5), we calculated odds ratios and 95% confidence intervals for the fraction of true T2D cases reclassified by the model as compared to the null. The area of the diamond for each odds ratio is proportional to the total number of reclassified individuals in the AMP-T2D-GENES test sample (total n assessed = 17,206; see Supplementary Table 9 for model-specific reclassification sample sizes). Error bars indicate 95% confidence intervals.

Extended Data Fig. 7 Secondary analysis of rare variant polygenic scores for UKB samples only.

To ensure that the ability of the rare variant polygenic score to reclassify an excess of true cases was not due to over-fitting, we built nine risk scores as in Extended Data Fig. 6 but with genes selected from an analysis of only UKB samples (Methods). For each of the nine resulting rare variant polygenic scores, we calculated odds ratios and 95% confidence intervals for the fraction of true T2D cases reclassified by the model as compared to the null expectation. The area of the diamond for each odds ratio is proportional to the total number of reclassified individuals in the AMP-T2D-GENES test sample (total n assessed = 17,206; see Supplementary Table 9 for model-specific reclassification sample sizes). Error bars indicate 95% confidence intervals.

Extended Data Fig. 8 Impact of adjusting rare variant effects for common variants included in the polygenic score.

Scatterplots indicate HbA1C gene-level effect sizes (mmol/mol) as estimated by burden tests with and without variants from the common variant PGS included as covariates in the test. a-g, Results are shown for each of the seven rare variant masks. We analyzed genes with nominal (P ≤ 0.05) rare variant associations and within 125 kb of a variant in the common variant PGS. Results indicate that, on average, rare variant effects remain roughly the same when adjusting for common variants. Spearman’s rank correlation coefficients (that is rho) and associated P-values (two-sided) are indicated on plots. Red line indicates slope of 1. Slopes (beta) and P-values (two-sided) calculated via linear regression are displayed on the plots; blue dotted lines indicate slopes estimated by the linear models.

Extended Data Fig. 9 Testing for heterogeneity across ancestry for variants included in common variant and rare variant polygenic scores.

a-h, We used a Cochran’s Q test to evaluate heterogeneity across ancestry-level single-variant and gene-level association results. QQ plots are shown for P-values from single-variant Q tests for common variants (a) and gene-level Q tests for different rare variant masks; included in each plot are the variants (or genes) included in the corresponding polygenic score (b-h). Departures above the diagonal red line suggest heterogeneity beyond the null expectation (blue lines indicate 95% confidence intervals for the null expectation), while lambda values indicate the ratio of the median observed chi square statistic to the median of the expected chi square statistic under the null; larger lambda values indicate larger deviations from the null. ‘HC’ indicates the LoFTee mask; ‘5/5 + LC 1%’ indicates the “5/5 LoFTee LC 1%” mask.

Extended Data Fig. 10 Fraction of variants found in enriched erythrocytic glycemic gene sets with negative effects on HbA1C levels.

Reported is the fraction of variants found in genes (i) with HbA1C gene-level P ≤ 0.05 and (ii) within a significantly enriched (P ≤ 0.05) erythrocytic (n = 4) or glycemic (n = 5) gene set that have negative HbA1C effect sizes based on the single variant meta-analysis. a-g, Results are shown for variants within each mask. The bars represent the fractions observed across all gene sets, while the dots represent the fractions observed for each individual gene set. A two-sided t-test was used to assess potentially significant differences; P-values are indicated within each plot. Error bars indicate standard error.

Supplementary information

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dornbos, P., Koesterer, R., Ruttenburg, A. et al. A combined polygenic score of 21,293 rare and 22 common variants improves diabetes diagnosis based on hemoglobin A1C levels. Nat Genet 54, 1609–1614 (2022). https://doi.org/10.1038/s41588-022-01200-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-022-01200-1

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing