Introduction

Acne vulgaris is a common skin disorder that results from inflammation of the pilosebaceous unit leading to the characteristic comedones, papules, pustules, nodules and cysts. Lesions are generally restricted to the face, neck, chest and back, and onset typically occurs during puberty. Prevalence estimates of acne vary substantially; it is estimated that more than 85% of teenagers are affected to some degree, and up to 8% have been reported with severe disease1, making acne the most prevalent skin disease worldwide2. According to the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD)3 acne was estimated to be responsible for nearly 5 million disability-adjusted life years (DALYs) globally in 2019, of which the majority occurred in those aged 15–49 years. This is greater than other chronic inflammatory conditions such as psoriasis or rheumatoid arthritis. Severe acne often persists into adulthood, and scar formation is more prevalent in this severe adult population. Acne is associated with impaired quality of life, including lower rates of employment in patients with acne, and reduced school and work performance4,5, as well as a substantial impact on mental health that correlates with clinical severity4,6. Individuals with acne have elevated rates of depression and higher rates of mental health illness than those without acne7, with rates of suicidal ideation and other mental health disorders being up to three times higher in adolescents with severe acne than their peers with little or no acne4. Several studies report increased clinical depression, anxiety and hospitalisation rates due to mental health disorders in adults and adolescents with acne, with the magnitude of the effect larger in adults8,9.

Despite considerable recent advances in new treatments for other inflammatory skin diseases, including psoriasis and atopic dermatitis, there is a substantial unmet medical need in the treatment of acne. Early and effective intervention strategies are often necessary to avoid irreversible scarring in severe acne. Treatment typically consists of topical and systemic agents that suppress the microbiome repertoire or the activity of sebaceous glands, while other treatments include hormonal treatment and phototherapy. The most effective agent to treat acne is isotretinoin, which may induce remission through its effect on epidermal differentiation, but its side effect profile, which includes dry skin, lips (cheilitis) and mucous membranes, muscle aches, itching of the skin and headaches10, restricts use to only those with severe disease. Isotretinoin is also a powerful teratogen whose use during pregnancy is restricted due to its association with severe and life-threatening birth defects. Better-tolerated acne treatments are required and there is a particular need for effective options without the risk of teratogenicity.

The genetic contribution to acne susceptibility has been demonstrated in several twin studies, with heritability consistently estimated around 80%11,12,13,14,15. Recent molecular genetic studies identified 17 genomic loci harbouring alleles associated with the disease—15 loci with a reported effect in European populations16,17 and two in a Han Chinese population18. Functional characterisation of these genetic association signals has implicated a series of causal genes whose genetic perturbation impacts the development and maintenance of the hair follicle and wound healing.

To further characterise the genetic architecture of acne vulgaris and identify additional genomic loci contributing to the disease susceptibility, we have performed a meta-analysis of genome-wide association studies (GWAS) of acne undertaken in nine independent cohorts that in total comprise 615,396 study participants (20,165 cases and 595,231 controls). Next, we combine fine-mapping and genome-wide analytical approaches to gain insights into the underlying genes and pathways through which the associated loci contribute to disease susceptibility, and the relationship between the genetic architecture of acne and other traits.

Results

Meta-analysis of acne GWAS

We conducted case-control GWAS of acne in fourteen datasets from nine independent European ancestry cohorts (Supplementary Note). Ascertainment of acne case status varied across the individual cohorts from clinical diagnoses of acne vulgaris by a dermatologist to self-reported disease (Supplementary Data 1). The resulting meta-analysis of the fourteen GWAS datasets demonstrated moderate inflation of test statistics (λGC  =  1.14, Supplementary Fig. 1), though the LD-score regression (LDSC) intercept (1.02) indicated that the inflation is driven by trait polygenicity rather than confounding bias. We identify genome-wide significant (P < 5 × 10−8) associations at 43 loci (Fig. 1, Table 1, Supplementary Figs. 4, 5), comprising 46 independent genetic variants associated with acne, with LD-based clumping indicating there are two independent (r2 < 0.001) genome-wide significant associations at three of the previously established loci (1q41, 5q11.2, 11q13.1).

Fig. 1: Manhattan plot showing genome-wide significant loci associated with acne (20,165 cases, 595,231 controls).
figure 1

Axes contain a point for each genetic variant passing QC ordered by chromosome and base position on the x-axis, with −log10(P-value) of association (two-sided Z-test, not adjusted for multiple comparisons) plotted on the y-axis. Blue indicates variants within previously established loci and red indicates variants within novel loci. Pink line indicates genome-wide significance threshold (P = 5 × 10−8).

Table 1 Meta-analysis genome-wide significant loci.

We observe association at 14 of the 17 previously reported acne susceptibility loci and 29 novel acne susceptibility loci (Fig. 1, Table 1). We fail to replicate the two acne risk loci previously reported in the Han Chinese population, 1q24.2 and 11p11.2, with neither of the reported lead variants reaching statistical significance in our meta-analysis (rs7531806 at 1q24.2: P = 0.336; rs747650 at 11p11.2: P = 0.0785) and minimal evidence of effect in our separate cohorts (Supplementary Data 2; Supplementary Fig. 6). The third previously reported acne susceptibility locus for which we do not observe an association at genome-wide significance is located at 2q14.2. However, we observe evidence of a sub-genome-wide significant effect of the lead variant, rs1092479, consistent with the previously reported magnitude and direction on acne risk (OR = 1.06, P = 9.84 × 10−6; Supplementary Fig. 6).

Identification of causal variants and genes

Statistical fine-mapping of each association signal revealed two loci at which a single variant was identified as the putative causal variant with a posterior probability >0.95 (Table 1). This includes rs1256580 at the TGFB2 locus and rs260643 as a candidate causal variant in the novel acne susceptibility locus at 2q12.3. rs260643 is located in a transcription factor binding site in a region of open chromatin within intron 5 of EDAR. Rare protein-coding variants in EDAR have been demonstrated to cause both recessive and dominant forms of ectodermal dysplasia (OMIM:224900, OMIM:129490), and acne risk allele of rs260643 itself has been reported to be associated with hair curl in a GWAS in Japanese women19,20. The range of phenotypes that result from functional variation at this locus closely resembles the known acne susceptibility locus harbouring WNT10A, at which genetic variation influences acne risk, hair curl, male pattern baldness and Mendelian forms of ectodermal dysplasia.

To identify additional putative causal genes at acne susceptibility loci, we used a combination of approaches: (a) colocalisation of the acne association signals with skin eQTL signals (Supplementary Data 3), (b) the identification of protein-coding variation within the 95% credible set (Supplementary Data 4) and (c) bioinformatics approaches to identify groups of genes with a related biological function that are located in the proximity of multiple associated loci (Supplementary Fig. 2, Supplementary Data 5, 6). This analytical approach strongly highlights the importance of genes implicated in cellular adhesion and motility. The support for these cellular processes arises from several putative causal genes previously implicated at established acne susceptibility loci, including LAMC2, TGFB2, WNT10A, LGR6, FGF2 and GLI2 but also highlights the potential consistency of biological processes involved in acne pathogenesis with the identification of other members of these gene sets located near new acne association signals.

EDNRA is highlighted as a potential causal gene at the acne risk locus at 4q31.22. EDNRA was strongly implicated as the causal gene at this locus through the DEPICT analysis (P = 1.4 × 10−4; Supplementary Data 5). The same genetic variant that influences acne risk is also associated with EDNRA expression in both sun-exposed and non-sun-exposed skin (PPSE = 0.98, PPNSE = 0.98; Supplementary Data 3), with the acne risk allele associated with a decrease in EDNRA expression. EDNRA encodes an endothelin receptor in which rare missense variants cause mandibulofacial dysostosis (OMIM:616367), where patterning defects of the hair follicle lead to alopecia.

At 3q21.1, there is further evidence supporting the importance of cell-cell adhesion processes in the skin in acne susceptibility through the identification of CSTA as the putative causal gene at this locus. CSTA encodes Cystatin A, a protease inhibitor with an established role in cell-cell adhesion. The acne susceptibility signal at 3q21.1 shares a common causal variant with skin eQTLs for CSTA with high probability (PPSE = 0.95, PPNSE = 0.98), with the acne risk allele lowering expression of CSTA. Homozygous loss-of-function of CSTA causes a peeling skin syndrome that results from extensive hyperkeratosis (OMIM:607936). There is also evidence of a potential shared biological mechanism with pustular psoriasis at 2q13. The acne susceptibility association signal colocalises with IL36RN eQTLs in both sun-exposed and non-sun-exposed skin (PPSE = 0.63, PPNSE = 0.62). The acne risk allele is associated with decreased IL36RN expression, which is directionally consistent with the rare putative loss of function missense variants in IL36RN underlying pustular forms of psoriasis (generalised pustular psoriasis, acrodermatitis continua of Hallopeau and palmoplantar pustulosis)21,22.

Genetic correlations and causal relationships of acne with other traits

Assuming a population prevalence of 30% for acne, the genome-wide significant acne risk loci explain an estimated 6.01% of the variance in acne liability. However, estimation of heritability explained by all common SNPs, i.e., the SNP-based heritability, indicates that 22.95% (s.e. = 0.02) of the variance in acne liability is explained by common genetic variation across the genome. We utilised this extensive polygenicity to examine the genetic correlation and potential causal relationship between acne and a series of 935 human diseases and traits, finding 45 traits with statistically significant genetic correlations (Supplementary Data 7). As has been previously observed, there is evidence of genetic correlation between acne and Crohn’s Disease (rg = 0.19, s.e. = 0.07) (Fig. 2a). We also observe evidence of shared genetic architecture with disease traits that are phenotypically associated with acne; this includes breast cancer (rg = 0.16, s.e. = 0.05) and psychiatric disorders such as schizophrenia (rg = 0.18, s.e. = 0.06) and bipolar disorder (rg = 0.12, s.e. = 0.05). There is also evidence of asymmetry in the observed genetic correlation between acne and endogenous testosterone and bilirubin levels, breast cancer, joint pain and headaches (Fig. 2b, Supplementary Data 7).

Fig. 2: Genetic correlation and latent causal variable analysis between acne and other complex traits.
figure 2

All analyses were conducted using GWAS summary statistic data from 935 complex traits in the CTG-VL platform. a Black circles represent point estimates of LD score-based genetic correlations. Error bars indicate 95% confidence intervals. b Colour bar indicates strength and direction of genetic correlation where red indicates a negative correlation and blue a positive correlation. Red line indicates significance threshold for multiple testing (FDR < 5%). CI confidence intervals, GCP Genetic causal proportion.

Polygenic prediction of acne risk

We evaluated the potential for a polygenic risk score (PRS) that estimates an individual’s genetic liability of acne to predict the phenotypic expression in an independent cohort of 2,058 people for whom the history of acne had been evaluated by questionnaire. The polygenicity of acne susceptibility was further assessed by comparing a PRS constructed with the lead variants at genome-wide significant loci (P < 5 × 10−8) and a PRS leveraging genome-wide effects across all SNPs with the SBayesR algorithm23. Whilst both scores were strongly associated with acne (defined as moderate or severe acne) in this independent cohort (PGWS threshold = 1.05 × 10−6, PSBayesR = 1.68 × 10−12), the cumulative risk derived from genome-wide significant association signals explained 2.8% (s.e. = 1.32%) of variance in acne liability compared to 5.6% (s.e. = 1.79) using SBayesR (Supplementary Fig. 3). In line with our prediction results, individuals reporting moderate or severe acne had significantly higher mean acne PRSs than those that reported no acne (PModerate-none = 2.4 × 10−13; PSevere-none = 1.6 × 10−16; Fig. 3). Similarly, when assessing the ability of acne PRS to predict increasingly strict definitions of acne, we find that the acne PRS has the greatest predictive ability in individuals with severe acne (AUC 0.7) (Fig. 3).

Fig. 3: Acne PRS analysis and prediction.
figure 3

Top panels show the difference in distributions of mean acne PRSs between individuals that report no acne (grey) to those that report having mild, moderate or severe acne (orange), those that report only moderate or severe acne (green) or those that report only having severe acne (purple). P-values were calculated using a two-sided Student’s t-test. Bottom panels depict the corresponding increase in predictive ability of the acne PRS in these different case groups using receiver operating characteristic (ROC) curves. AUC Area under the curve.

Discussion

Acne is a complex human trait, with heritability estimates of up to 80% reported consistently in twin studies11,13. The current study represents an approximate fourfold increase in the number of cases compared to the previous meta-analysis17. Notably, we do not observe evidence of association in the current meta-analysis, or any of the contributing studies, at either of the two acne risk loci observed in the Han Chinese population18, highlighting potential differences in the genetic architecture of acne between different ethnic populations. This warrants further investigation in studies of diverse ancestry.

At several of the newly identified acne risk loci, we find evidence of the importance of the structure and morphology of the hair follicle in disease susceptibility. There are strong parallels between the phenotypic consequences of the allelic series of variation at the EDAR locus and the previously implicated risk locus at WNT10A. At both of these loci, acne susceptibility, hair morphology and ectodermal dysplasia all result from genetic variation that impacts the function of the respective genes. The presence of other Mendelian skin disease genes at other risk loci also provides key insight into the Mendelian mechanisms that contribute to acne susceptibility. This includes evidence for the development and maintenance of the pilosebaceous unit, indicated by the presence of several ectodermal dysplasia genes at acne risk loci, and also neutrophilic inflammation evidence by the presence of an acne susceptibility association signal at the IL36RN locus at which rare loss of function alleles have been associated with a series of pustular skin phenotypes21,22,24.

In addition to identifying specific loci contributing to acne susceptibility, our results demonstrate substantial polygenicity beyond the genome-wide significant association signals. In addition to the estimated 6% of the variance in acne liability explained by the 46 independent genome-wide significant variants identified here, a further 17% of acne liability is estimated to be explained by the polygenic tail. The polygenic architecture enabled the evaluation of genetic relationships with other traits that share underlying biological pathways, including immune-mediated disorders including Crohn’s disease and evidence of shared genetic aetiology with other traits including a series of mental health disorders and breast cancer. The epidemiological association between acne and mental health has been well documented. However, the direction of causality remains unclear; several observational studies indicate that damaged self-esteem resulting from acne development may mediate the association between severe acne and internalising psychopathology25,26 and depression, anxiety, and emotional lability have been associated with isotretinoin treatment27. In contrast there is also evidence that stress, poor self-care and drug treatments for mental health disorders can cause acne25,28,29. Elevated rates of breast cancer have been previously highlighted in females with a history of acne30 and the shared influence of genetics between acne and breast cancer highlights the potential importance of endogenous hormone regulation, which is a key component of breast cancer risk and is further supported by a putative causal relationship between testosterone levels and acne. We also identify shared genetic architecture between acne and chronic pain, specifically joint pain and headache. Notably, there appears to be asymmetry in the genetic correlation, consistent with loci contributing to acne susceptibility being causal for chronic pain.

The current study capitalised on acne diagnoses ascertained by multiple different approaches, from clinical diagnosis to self-reported disease with varying criteria for case and control definition. This heterogeneity in cohort definitions may introduce bias into the effect size estimates, with estimates from the clinically ascertained cohorts typically larger than those observed in the cohorts where acne was self-reported or ascertained from electronic health records. In addition, the exclusion of mild acne cases from some of the self-reported cohorts may lead to inflated effect size estimates. Nevertheless, the acne PRS defined from the meta-analysis is strongly associated with self-reported acne history across mild, moderate and severe groups. Consistent with a liability threshold model of acne risk, the generated acne PRSs were associated with disease severity, with the best prediction among the severe group. Whilst further validation is needed, a robust genetic predictor of acne risk may have utility in identifying individuals at the highest risk of acne and intervene with prophylactic skin care regimes to minimise follicular occlusion before bacterial colonisation and extensive inflammation is established to reduce disease severity and scarring. The current study sought to establish genetic risk factors that are independent of sex and age, however, further investigation to define specific genetic loci with sex or age-specific effects will improve our mechanistic understanding of the disease and has the potential to improve polygenic prediction by modelling of dimorphic effects.

In summary, the results of the current study represent a transformational increase in our understanding of the genetic basis of acne. Interrogation of these loci further illustrates the shared biology processes with other skin and hair traits and interrogation of genome-wide genetic liability identified shared genetic aetiology with other common diseases. Our results highlight the substantial influence on genetic risk harboured by other, as yet undiscovered loci and motivate future studies to both identify additional risk loci and establish the biological processes through which genetic risk is mediated.

Methods

Cohort details

Data were collected from 9 collaborating centres, with some centres providing independent GWAS for more than one cohort, creating a total of 14 independent datasets. Acne vulgaris definitions varied between the cohorts and consisted of clinical assessment, electronic health record coding and self-reported diagnoses of acne, resulting in a final sample size of 20,165 cases and 595,231 controls (Supplementary Data 1). Detailed information regarding informed consent, ethical approval, recruitment, genotyping, QC and GWAS in each cohort is described in Supplementary Note 1.

Meta-analysis

We conducted an inverse-variance-weighted meta-analysis of 615,396 individuals (20,165 cases and 595,231 controls) from each of the cohorts described above (14 datasets) using METAL31. All variants were aligned to positions on human build GRCh37 (hg19), and variants with MAF < 1% and an imputation accuracy score <0.7 were excluded prior to the meta-analysis. Only variants present in at least 9 of the 14 datasets were included in the final meta-analysis, resulting in 7,072,770 variants.

Locus definition/LD clumping

Linkage disequilibrium (LD) clumping was used to identify the positions of loci containing acne-associated variants. Clumping of the results from the meta-analysis was conducted using Plink v1.932 with the following parameters: a P-value cut-off of 5 × 10−8, 1 Mb distance between variants and r2 < 0.001 for variants within the genomic distance cut-off, using the linkage disequilibrium structure of the European ancestry subset of the 1000 Genomes Project as a reference panel. Due to the complexity of the major histocompatibility complex region (chr6:26-34 Mb), only the most significant variant in this locus has been reported.

Fine-mapping

An approximate Bayes factor was calculated from the effect size and standard error of each variant in each associated locus (lead SNP ± 1 Mb), using the approach defined by Wakefield33, assuming a prior variance on the log odds ratios of 0.04 (Eq. (1)). The resulting Bayes factors were then re-scaled to reflect the posterior probability for each variant being causal, and 95% credible sets were defined as the minimal set of variants whose combined posterior probabilities sum to ≥0.95. Where there were multiple independently segregating SNPs within the flanking 1 Mb regions, fine-mapping was only performed once, using the SNP with lowest P-value as lead SNP.

$${{{{{\rm{ABF}}}}}}=\sqrt{\frac{V+W}{V}}\,{{\exp }}\left(-\frac{{z}^{2}}{2}\frac{W}{\left(V+W\right)}\right)$$
( 1)

V is the standard error of the maximum likelihood estimate; z2 is Wald statistic (β2/V); W is prior variance on the log odds ratios.

eQTL colocalisation

We examined the colocalisation between acne association signals and skin cis-eQTLs from the GTEx (The Genotype-Tissue Expression project) consortium. Candidate skin eQTLs were defined as any variant located within an acne risk locus (±1 Mb from lead SNP) that was also associated with variation in the expression of a nearby gene (P  < 1 × 10−4). A Bayesian test for colocalisation between the acne association signal and the skin eQTL signal was performed using a set of variants that overlapped between the two studies using the R package coloc34, with a prior probability of colocalisation defined as P: 10−5. A posterior probability exceeding 50% was used as evidence of colocalisation.

Identification of coding SNPs in loci 95% credible sets

For each locus (excluding HLA), all variants present in the 95% credible sets were annotated using Ensembl Variant Effect Predictor35. Both risk and protective alleles were queried. The HLA locus was excluded.

Gene and gene-set annotation

To gain insights into the underlying genes and pathways that explain our associations, we used DEPICT (https://github.com/perslab/depict) to prioritise causal genes and identify gene-sets and tissue types in which these genes are enriched36. DEPICT uses 14,461 reconstituted gene-sets, in each of which a z-score of association is calculated for each gene across the genome. DEPICT prioritises genes at associated loci by calculating how highly correlated they are with other genes in that gene-set. We used a P-value threshold of 1 × 10−5 to define loci associated with acne. We corrected for multiple testing using Benjamini-Hochberg’s False Discovery Rate (FDR < 5%). Further annotation of links, including membership of gene families, co-existence within curated pathways, co-expression across tissues and experimentally determined interactions was performed between genes located in acne susceptibility loci was performed using Ensembl37 and STRING databases38 databases. The low-confidence threshold (0.150) was used to define links between genes in STRING.

Heritability, genetic correlations and latent causal variable analysis

We used LD-Score (LDSC) regression39 and the HapMap3 reference panel to estimate the total SNP-based heritability (h2SNP) of the acne meta-analysis. We also estimated the variance explained by the genome-wide significant SNPs alone using R package Mangrove (https://CRAN.R-project.org/package=Mangrove). For both analyses, we assumed a population prevalence of 30%. We used the Complex Traits Genetics Virtual Lab (CTG-VL, https://genoma.io/) to calculate genetic correlations between our acne meta-analysis and 935 complex behavioural and disease traits and screen these traits for a potential causal association with acne using the Latent Causal Variable (LCV) method40. LCV does not directly test for causality but instead estimates a genetic causal proportion (GCP) parameter that mediates an association between two traits; a GCP of 0 indicates no genetic causal association, and a GCP of 1 indicates full genetic causality. GCP values lower than 1 but greater than 0 indicate partial causality. In this study, we considered a GCP > 0.7 as an indication of a causal relationship. We applied a multiple testing correction (FDR < 5%) to determine statistical significance.

PRS analysis

We leveraged our meta-analysis results to build acne PRS and assess their predictive ability in an independent sample. We used acne data collected in the Prospective Imaging of Aging Study (PISA) at QIMR Berghofer as our target sample. PISA currently consists of ~3000 genotyped individuals who have completed extensive behavioural, psychological and medical questionnaires and cognitive testing and brain imaging41. Participants were asked about the presence and severity of acne as a teenager to which they could answer None, Mild, Moderate or Severe. We conducted two regression analyses: first, using all participants with cases coded on an ordinal scale and second, using only individuals that answered Moderate or Severe included as cases in our analysis (N = 527), and individuals that reported not having acne considered controls (N = 645). Individuals were genotyped as described in the Supplementary Note. To avoid bias due to potential sample overlap between PISA and the other QIMR cohorts included in the meta-analysis, we used genotype information to exclude all individuals with an IBD > 0.125—which equates to third-degree relatives.

We calculated PRS using both SBayesR v2.03 and the traditional clumping and thresholding (C+T) methods. SBayesR is a Bayesian method that assumes that SNP effects are drawn from a mixture of four zero-mean normal distributions with different variances23. This method re-scales the GWAS SNP effects with many SNPs assumed to have an effect size of zero. SBayesR has been shown to outperform other PRS methods in the prediction of complex traits. For the LD reference, we used the same sparse LD matrix as in Lloyd-Jones et al.23, where the LD matrix was built based on the HapMap3 SNPs of randomly selected and unrelated 50,000 UK Biobank individuals. The posterior SNP effects were used to generate PRS for each individual using the --score function in PLINK. PRS were calculated using the C+T method across eight different SNP P-value significance thresholds: P < 5 × 10−8, P < 1 × 10−5, P < 0.001, P < 0.01, P < 0.05, P < 0.1, P < 0.5, P < 1. For each individual, at each threshold, an acne PRS was calculated by multiplying the dosage score and effect size for each SNP, and then these values were summed across all loci. For more detail on C+T PRS calculation see Mitchell et al.42.

For each PRS, a mixed model regression was conducted with the acne PRS as a predictor variable while accounting for sex, age and the first ten genetic principal components (to account for residual population stratification) as fixed effects; relatedness among individuals was accounted for as a random effect with a genetic relatedness matrix, as implemented in GCTA 1.91.743. A partial R2 was used to estimate the variance explained by the PRS. The predictive ability of the PRS was further evaluated using receiver operating characteristic (ROC) curves using ROC curves in R v3.6.1 using the pRoc package44. Significance values were calculated using a two-tailed Student’s t-test.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.