Main

Plasma concentrations of total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C) and triglycerides (TG) are heritable risk factors for cardiovascular disease and targets for therapeutic intervention1. Genome-wide association studies (GWASs) involving up to 20,000 individuals of European ancestry have identified >30 genetic loci contributing to inter-individual variation in plasma lipid concentrations2,3,4,5,6,7,8,9,10. Half of these loci harboured genes previously known to influence plasma lipid concentrations, establishing the technical validity of the lipid GWAS. Nevertheless, the practical value of the GWAS approach remains a subject of debate11,12,13,14.

Here we focus on three key questions motivated by recent progress in genetic mapping: (1) are loci identified in populations of European descent important in non-European groups, suggesting relevance in different global populations; (2) are these loci of clinical relevance, providing the framework to identify potential novel drug targets for the treatment of extreme lipid phenotypes and prevention of CAD; and (3) do these loci harbour genes with biological relevance, that is, which are directly involved in lipid regulation and metabolism?

We address these questions using several approaches: a genome-wide association screen for plasma lipids in >100,000 individuals of European ancestry; evaluation of mapped variants in East Asians, South Asians and African Americans; association testing in individuals with and without CAD; evaluation of genetic variants in patients with extreme plasma lipid concentrations; and genetic manipulation in mouse models.

GWAS in >100,000 individuals

To identify additional common variants associated with plasma TC, LDL-C, HDL-C and TG concentrations, we performed a meta-analysis of 46 lipid GWASs (Supplementary Tables 1–4). These studies together comprise >100,000 individuals of European descent (maximum sample size 100,184 for TC, 95,454 for LDL-C, 99,900 for HDL-C and 96,598 for TG), ascertained in the United States, Europe or Australia. In each study, we used genotyped single nucleotide polymorphisms (SNPs) and phased chromosomes from the HapMap CEU (Utah residents with ancestry from northern and western Europe) sample to impute autosomal SNPs catalogued in the HapMap; SNPs with minor allele frequency (MAF) >1% and good imputation quality (see Methods in Supplementary Information) were analysed. A total of 2.6 million directly genotyped or imputed SNPs were tested for association with each of the four lipid traits in each study. For each SNP, evidence of association was combined across studies using a fixed-effects meta-analysis.

We identified 95 loci that showed genome-wide significant association (P < 5 × 10−8) with at least one of the four traits tested (Table 1; Supplementary Fig. 1 and Supplementary Table 2). These include all of the 36 loci previously reported by GWAS at genome-wide significance2,3,4,5,6,7,8,9,10 and 59 loci reported here in a GWAS for the first time. Among these 59 novel loci, 39 demonstrated genome-wide significant association with TC, 22 with LDL-C, 31 with HDL-C, and 16 with TG. Among the 36 known loci, 21 demonstrated genome-wide significant association with another lipid phenotype in addition to that previously described. To rule out spurious associations arising as a result of imputation artefact, at nearly all loci we were able to identify proxy SNPs that had been directly genotyped on Illumina and/or Affymetrix arrays and confirm each of the associations (Supplementary Table 5). The full association results for each of the four traits are available at http://www.broadinstitute.org/mpg/pubs/lipids2010/ or http://www.sph.umich.edu/csg/abecasis/public/lipids2010.

Table 1 Meta-analysis of plasma lipid concentrations in >100,000 individuals of European descent.

To evaluate whether additional independent association signals existed at each locus, we performed conditional association analyses for each of the four lipid traits including genotypes at the lead SNPs for each of the 95 loci as covariates in the association analyses (see Supplementary Methods). These analyses identified secondary signals in 26 loci (Supplementary Table 6); when these additional SNPs are combined with the lead SNPs, the total set of mapped variants explains 12.4% (TC), 12.2% (LDL-C), 12.1% (HDL-C), and 9.6% (TG) of the total variance in each lipid trait in the Framingham Heart Study, corresponding to 25–30% of the genetic variance for each trait.

Previous studies have suggested sex-specific heritability of lipid traits15. A key challenge in addressing this issue is evaluating enough men and women to achieve adequate statistical power for each sex. We re-analysed the GWAS for the four lipid traits separately in women (n = 63,274) and in men (n = 38,514). Four of the 95 loci identified in the primary analysis showed significant heterogeneity of effect size (P < 0.0005) between men and women (Supplementary Table 7). Moreover, an additional five loci had significant association in only one sex and not in the sex-combined analysis. Two loci associated with HDL-C in the sex-combined analysis (KLF14 and ABCA8) showed female-specific association with TG and LDL-C, respectively. The KLF14 locus is a striking example, with rs1562398 significantly associated with TG in women (effect size = –0.046 for the C allele, P = 2 × 10−12), but not in men (effect size = –0.012, P = 0.05) (Supplementary Fig. 2 and Supplementary Table 7).

To gain insight into how DNA variants in associated loci might influence plasma lipid concentrations, we tested whether the mapped DNA sequence variants regulate the expression levels of nearby genes (expression quantitative trait loci, or eQTLs) in human tissues relevant to lipoprotein metabolism (liver and fat)16. We carried out genotyping and RNA expression profiling of >39,000 transcripts in three types of human tissue samples from liver (960 samples), omental fat (741 samples) and subcutaneous fat (609 samples). We examined the correlations between each of the lead SNPs at the 95 loci and the expression levels of transcripts located within 500 kilobases of the SNP. We pre-specified a conservative threshold of statistical significance at P < 5 × 10−8. At this threshold, we identified 38 SNP-to-gene eQTLs in liver, 28 in omental fat, and 19 in subcutaneous fat (Table 1; Supplementary Tables 8–10). Some lead SNPs are quite remote from the associated gene transcripts. For example, rs9987289 (associated with both LDL-C and HDL-C) correlates with a twofold change in liver expression of PPP1R3B, yet is 174 kb away from the gene, which as demonstrated below is likely to be a causal gene. Similarly, rs2972146 (associated with both HDL-C and TG in this study, as well as with insulin resistance and type 2 diabetes mellitus in a previous study17) correlates with IRS1 expression in omental fat, despite being located 495 kb away from the gene.

Relevance of GWAS loci in non-Europeans

As all of the individuals studied in our primary GWAS were of European ancestry, it remained unclear if the loci we identified in Europeans are relevant in non-European individuals. To address this question, we performed additional analyses in cohorts comprising >15,000 East Asians (Chinese, Koreans and Filipinos), >9,000 South Asians and >8,000 African Americans (Table 1; Supplementary Table 11). As a similarly sized control, we also performed genotyping in a cohort of 7,000 additional Europeans.

In the European group, we found that 35 of 36 lead SNPs tested against LDL-C had the same direction of association as seen in the primary (>100,000 person) analysis (see Supplementary Table 12 for explanation), 44 of 47 SNPs for HDL-C, and 29 of 32 SNPs for TG. Such directional consistency for the three traits is unlikely to be due to chance (P = 5 × 10−10 for LDL-C, P = 1 × 10−10 for HDL-C, and P = 1 × 10−6 for TG). For further replication evidence, we performed direct genotyping of a subset of the lead SNPs in two European cohorts together totalling 12,000 individuals and found that 24 of 26 tested SNPs had the same direction of association (Supplementary Table 13).

We observed similar proportions in South Asians, with 29 of 32 lead SNPs tested against LDL-C having the same direction of association as in the primary analysis (P = 1 × 10−6), 35 of 39 SNPs for HDL-C (P = 2 × 10−7), and 24 of 27 SNPs for TG (P = 3 × 10−5). We also had consistent results with East Asians (LDL-C: 29 of 36, P = 2 × 10−4; HDL-C: 38 of 44, P = 5 × 10−7; TG: 26 of 28, P = 2 × 10−6), with more modest evidence for replication in African Americans (LDL-C: 33 of 36, P = 1 × 10−7; HDL-C: 37 of 44, P = 3 × 10−6; TG: 24 of 30, P = 7 × 10−4). Furthermore, we found that the proportions of SNPs that had the same direction of association and P < 0.05 were similar in the European, South Asian and East Asian replication groups, with smaller proportions in African Americans (Supplementary Table 12). Of note, for a majority of the loci, there was no evidence of heterogeneity of effects between the primary European groups and each of the non-European groups (Supplementary Table 11).

These observations indicate that most (but probably not all) of the 95 lipid loci identified in this study contribute to the genetic architecture of lipid traits widely across global populations. They also suggest future studies to localize causal DNA variants by leveraging differences in linkage disequilibrium (LD) patterns among populations. We evaluated the potential for fine mapping by comparing the number of SNPs in LD with lead SNPs in three HapMap populations (Supplementary Table 14). At many loci, only a subset of SNPs in high LD (r2 ≥ 0.8) with the lead SNP in HapMap CEU are also in high LD with the lead SNP in HapMap YRI (Yoruba in Ibadan, Nigeria) individuals or in the joint JPT+CHB (Japanese in Tokyo, Japan and Han Chinese in Beijing, China) cohort. Such differential LD patterns can prove useful to refine association boundaries and prioritize SNPs for functional evaluation, as demonstrated for the LDL-C-associated locus on chromosome 1p13 (reported in the accompanying paper ref. 18).

Clinical relevance of GWAS loci

To assess whether the GWAS approach yields clinical insights of potential therapeutic relevance, we sought to determine which of the lipid-associated lead SNPs are also associated with CAD in a manner consistent with established epidemiological relationships (that is, SNP alleles which increase TC, LDL-C or TG or that decrease HDL-C should be associated with increased risk of CAD). Whereas LDL-C is an accepted causal risk factor for CAD, it is unclear whether HDL-C and/or TG are also causal risk factors. This uncertainty was reinforced by the failure of a drug that raised HDL-C via cholesteryl ester transfer protein (CETP) inhibition to reduce the risk of cardiovascular disease19.

Whether other drugs that specifically raise HDL-C or lower TG can reduce CAD risk remains an open question. In contrast, the most widely marketed drugs for lowering of LDL-C, statins, have been demonstrated in numerous clinical trials to reduce risk of CAD. Statins inhibit hydroxy-3-methylglutaryl coenzyme A reductase (the protein product of HMGCR) and thereby reduce LDL-C and TC levels. We observed that the variant of our lead SNP in the HMGCR locus that is associated with lower LDL-C levels is also associated with lower CAD risk (P = 0.004), consistent with the clinical effects of statins. Analogously, common variants in other lipid-associated loci that are also associated with CAD may implicate genes at these loci as possible therapeutic targets.

We performed association testing for each of the lead SNPs from this study in 24,607 individuals of European descent with CAD and 66,197 without CAD, with a pre-specified one-sided significance threshold of P < 0.001 requiring directionality consistent with the relevant lipid–CAD epidemiological relationship. A limited number of loci met this criterion (Table 1; Supplementary Table 15), with most of them being associated with LDL-C—consistent with LDL-C being a causal risk factor for CAD.

Four novel CAD-associated loci related specifically to HDL-C or TG, but not LDL-C: IRS1 (HDL-C, TG), C6orf106 (HDL-C), KLF14 (HDL-C) and NAT2 (TG). That these loci were associated with CAD shows that there may be selective mechanisms by which HDL-C or TG can be altered in ways that also modulate CAD risk. However, it is also possible that causal genes in these loci may have pleiotropic effects on non-lipid parameters that are causal for CAD risk reduction. For example, the major allele of the lead SNP in the IRS1 locus is associated with increased risk of type 2 diabetes mellitus, insulin resistance and hyperinsulinemia17, along with decreased HDL-C, increased TG and increased risk of CAD; it remains unclear which of the metabolic risk factors are responsible for the increased CAD risk.

Besides CAD, a second clinically relevant phenotype is hyperlipidaemia. We asked whether the common variants in the 95 associated loci, each with individually small effects on plasma lipids, combine to contribute to extreme lipid phenotypes. We genotyped individuals identified in three independent studies as having high LDL-C (n = 532, mean 219 mg dl−1), high HDL-C (n = 652, mean 90 mg dl−1) or high TG (n = 344, mean 1,079 mg dl−1). For each extreme case group, individuals with low plasma LDL-C (n = 532, mean 110 mg dl−1), HDL-C (n = 784, mean 36.2 mg dl−1) or TG (n = 144, mean 106 mg dl−1) served as control groups. In each case-control sample set, we calculated risk scores summarizing the number of LDL-C-, HDL-C-, or TG-raising alleles weighted by effect size.

For LDL-C, we found that individuals with an LDL-C allelic dosage score in the top quartile were 13 times as likely to have high LDL-C as individuals in the bottom quartile (P = 1 × 10−14) (Supplementary Fig. 3; Supplementary Tables 16 and 17). For HDL-C, individuals in the top quartile of the HDL-C risk score were four times as likely to have high HDL-C as individuals in the bottom quartile (P = 2 × 10−16). For TG, individuals in the top quartile of the TG risk score were 44 times as likely to be hypertriglyceridaemic as individuals in the bottom quartile (P = 4 × 10−28). These results indicate that the additive effects of multiple common variants contribute to determining membership in the extremes of a quantitative trait distribution.

Biological relevance of GWAS loci

Whether the GWAS approach can yield biological insights that improve our understanding of the mechanisms underlying phenotypes such as plasma lipid concentrations remains an open question. Loci identified through GWAS may explain a very small proportion of the variance in a phenotype through naturally occurring common variants in humans, but they may have a greater impact through rare variants or when targeted by pharmacological or genetic intervention.

We surveyed our 95 GWAS loci and asked whether any nearby genes are linked to known Mendelian lipid disorders. There is remarkable overlap between the loci identified here and 18 genes previously implicated in Mendelian lipid disorders (Supplementary Table 18). Fifteen of the genes underlying these Mendelian disorders lie within 100 kb of one of our lead SNPs, including eight that lie within 10 kb of the nearest lead SNP. In 1,000,000 simulations of 95 randomly drawn SNPs, selected to match our lead SNPs with respect to MAF and the number of nearby genes, the average simulation showed no overlapping loci and none showed more than eight overlapping loci.

An additional two loci represent well-established drug targets for the treatment of hyperlipidaemia: HMGCR (statins) and NPC1L1 (ezetimibe). Several other loci harbour genes that were already known to influence lipid metabolism before this study: LPA, which encodes lipoprotein(a); PLTP, which encodes phospholipid transfer protein; ANGPTL3 and ANGPTL4, lipoprotein lipase inhibitors; SCARB1, a HDL receptor that mediates selective uptake of cholesteryl ester; CYP7A1, which encodes cholesterol 7-alpha-hydroxylase; STARD3, a cholesterol transport gene; and LRP1 and LRP4, members of the LDL receptor-related protein family. Notably, the protein product of one of the genes implicated by our study—MYLIP—is a ubiquitin ligase that had no recognized role in lipid metabolism before our study’s inception, but has since been independently demonstrated to be a regulator of cellular LDL receptor levels and is now termed Idol (inducible degrader of the LDL receptor)20.

GALNT2 (encoding UDP-N-acetyl-alpha-d-galactosamine:polypeptide N-acetylgalactosaminyl transferase 2) is a member of a family of GalNAc-transferases, which transfer an N-acetyl galactosamine to the hydroxyl group of a serine/threonine residue in the first step of O-linked oligosaccharide biosynthesis. It is the only gene in the mapped locus on chromosome 1q42 within 150 kb of the lead SNP (rs4846914), which is located in an intron of the gene. We therefore reasoned that GALNT2 would be an ideal candidate for functional validation in a mouse model. We introduced the mouse orthologue Galnt2 into mouse liver via a viral vector. Liver-specific overexpression of Galnt2 resulted in significantly lower plasma HDL-C (24% compared to control mice) by 4 weeks (Fig. 1a). We also performed knockdown of endogenous liver Galnt2 through delivery of a short hairpin RNA via a viral vector. Reduction of the transcript level (95% knockdown as determined by qRT–PCR) resulted in higher HDL-C levels by 4 weeks (71% compared to control mice) (Fig. 1b). These observations validate GALNT2 as a biological mediator of HDL-C levels.

Figure 1: Effects of altered Galnt2, Ppp1r3b or Ttc39b expression in mouse liver on plasma lipid levels.
figure 1

a, b, Overexpression and knockdown of Galnt2. Plasma HDL-C levels at baseline, 2 weeks or 4 weeks after injection of viral vectors are shown. n = 6 mice per group. c, Overexpression of Ppp1r3b. Plasma HDL-C levels at baseline, 2 weeks or 4 weeks after injection of viral vectors are shown. n = 7 mice per group. d, Knockdown of Ttc39b. Plasma HDL-C levels at baseline, 4 days or 7 days after injection of viral vectors are shown. n = 6 mice per group. Error bars show s.d. Because independent experiments were performed at different times and/or sites, there is variability in baseline HDL-C levels.

PowerPoint slide

We further asked whether eQTL studies could facilitate the identification of causal genes in loci with multiple genes. Out of several genes surrounding a locus on chromosome 8p23 found to be associated with HDL-C, LDL-C and TC (Table 1), only PPP1R3B (encoding protein phosphatase 1, regulatory (inhibitor) subunit 3B) was found to have an eQTL in liver (Supplementary Table 7). The allele associated with increased expression correlated with lower levels of each of the lipid traits. This eQTL relationship indicates that higher expression of PPP1R3B will lower plasma lipids. Consistent with this prediction, overexpression of the mouse orthologue Ppp1r3b in mouse liver via a viral vector resulted in significantly lower plasma HDL-C levels at 2 weeks (25%) and 4 weeks (18%) (Fig. 1c), as well as lower TC levels at 2 weeks (21%) and 4 weeks (14%) (data not shown).

Similarly, on a locus on chromosome 9p22 found to be associated with HDL-C, TTC39B (encoding tetratricopeptide repeat domain 39B) was the only one of several genes in the locus to have an eQTL in liver (Supplementary Table 7), with the allele associated with decreased expression correlating with increased HDL-C. Consistent with this eQTL, knockdown of the mouse orthologue Ttc39b via a viral vector, with 50% knockdown of transcript as determined by qRT–PCR, resulted in significantly higher plasma HDL-C levels at 4 days (19%) and 7 days (14%) (Fig. 1d). These data indicate PPP1R3B and TTC39B as causal genes for lipid regulation. These findings, combined with the demonstration that SORT1 is a causal gene for LDL-C and is regulated in its expression by a GWAS SNP (reported in the accompanying paper ref. 18), support the use of eQTL studies to prioritize functional validation of GWAS-nominated genes.

Together, these observations establish that some of the identified 95 loci harbour novel bona fide lipid regulatory genes and show that with additional functional studies many, if not all, of the loci will yield insights into the biological underpinnings of lipid metabolism.

New biological, clinical and genetic insights

Through a series of studies, we demonstrate that (1) at least 95 loci across the human genome harbour common variants associated with plasma lipid traits in Europeans, (2) the loci contribute to lipid traits in multiple non-European populations, (3) some of these loci are associated not only with lipids but also with risk for CAD, (4) common variants in the loci combine to contribute to extreme lipid phenotypes and (5) many of the identified loci harbour genes that contribute to lipid metabolism, including the novel lipid genes GALNT2, PPP1R3B and TTC39B that we validated in mouse models.

It has recently been suggested that conducting genetic studies with increasingly larger cohorts will be relatively uninformative for the biology of complex human disease, particularly if initial studies have failed to explain a sizable fraction of the heritability of the disease in question11. As the reasoning goes, analysis of a few thousand individuals will uncover the common variants with the strongest effect on phenotype. Larger studies will suffer from a plateau phenomenon in which either no additional common variants will be found or any common variants that are identified will have too small an effect to be of biological interest.

Our study provides strong empirical evidence against this assertion. We extended a GWAS for plasma lipids from 20,000 to 100,000 individuals and identified 95 loci (of which 59 are novel) that, in aggregate, explain 10–12% of the total variance (representing 25–30% of the genetic variance). Even though the lipid-associated SNPs we identified have relatively small effect sizes, some of the 59 new loci contain genes of clear biological and clinical importance—among them LDLRAP1 (responsible for autosomal recessive hypercholesterolemia), SCARB1 (receptor for selective uptake of HDL-C), NPC1L1 (established drug target), MYLIP (recently characterized regulator of LDL-C), and PPP1R3B (newly characterized regulator of HDL-C). We expect that future investigations of the new loci (for example, resequencing efforts to identify low-frequency and rare variants, or functional experiments in cells and animal models, as demonstrated for SORT1 in a separate study reported in the accompanying paper ref. 18) will uncover additional important new genes. Thus, the data presented in this study provide a foundation from which to develop a broader biological understanding of lipoprotein metabolism and to identify potential new therapeutic opportunities.

Methods Summary

The full Methods are in Supplementary Information and provide information about (1) study samples and phenotypes; (2) genotyping and imputation; (3) genome-wide association analyses; (4) meta-analyses of directly typed and imputed SNPs; (5) estimation of effect sizes; (6) conditional analyses of top signals; (7) sex-specific analyses; (8) cis-expression quantitative trait locus analyses; (9) analyses of lipid-associated SNPs in European and non-European samples; (10) analyses of lipid-associated SNPs in individuals with and without CAD; (11) analyses of associated SNPs in patients with extreme LDL-C, HDL-C or TG levels; (12) simulation studies to assess overlap between GWAS signals and Mendelian disease loci; and (13) details of mouse studies.