Introduction

Systemic lupus erythematosus (SLE; OMIM 152700) is an autoimmune disease characterized by production of autoantibodies and multiple organ damage. Genetic factors play a key role in the disease, with estimates of its heritability ranging from 43% to 66% across populations1,2,3. Differences in the expression of the disease across ancestral groups have been reported with non-European populations showing an earlier age of onset, 2–4 fold higher prevalence and 2–8 fold higher risk of developing end-stage renal disease than European populations4,5,6,7. Responses to treatment of SLE with the novel monoclonal antibody against B-cell activating factor (BAFF), Belimumab, also show variation across ancestral groups8,9. These findings highlight the heterogeneous nature of the disease, so closer examination of ancestral group differences is likely to improve disease risk prediction and lead to more precise treatment options.

More than 90 loci have been shown to be associated with SLE through genome-wide association studies (GWAS)10,11,12. Trans-ancestral group studies conducted previously were primarily designed to increase power and to identify SLE susceptibility loci shared across ancestries13,14. However, due to inadequate power in studies involving non-Europeans, current findings are biased towards loci associated with SLE in European populations. Some risk alleles reported from studies on European populations, such as those in or near PTPN22, NCF2, SH2B3, and TNFSF13B, are absent in East Asian populations15 while a missense variant (rs2304256) in TYK2 points to a European-specific disease association16,17,18.

The basis for ancestral group differences in the manifestation of SLE at the genome level remains poorly understood. Further studies on non-European populations will help define the genetic architecture underlying SLE and the consequences of patients’ ancestral backgrounds. To this end, we genotyped 8252 participants of Han Chinese descent recruited from Hong Kong (HK), Guangzhou (GZ) and Central China (CC), and combined these data with previous datasets to give a total of ten SLE genetic cohorts consisting of 11,283 cases and 24,086 controls. The increased sample size, particularly for those of Chinese ancestry, allowed identification of novel disease loci and comparative analyses of the genetic architectures of SLE between major ancestral groups. In this work, we identify 38 novel loci associated with SLE and demonstrate both shared and specific genetic components between East Asians and Europeans.

Results

Data set preparation

Han Chinese data: After removing individuals with a low genotyping rate or hidden relatedness, the 7596 subjects of Han Chinese descent from HK, GZ, and CC genotyped in this study and the 5057 subjects from the existing Chinese GWAS13 gave a Chinese ancestry data set of 4222 SLE cases and 8431 controls (Supplementary Tables 12 and Supplementary Fig. 1). Ethnic European Data: Existing GWAS data from European populations19 were reanalyzed, based on principal components (PC) matching those for subjects from the 1000 Genomes Project to minimize the potential influence of population substructures20 (see “Methods” section) and grouped into three cohorts, EUR GWAS 1–3 (Supplementary Fig. 2). The recent GWAS21 data from Spain (SP) was included. After quality control, the European data included 4576 cases and 8039 controls. A further 2485 SLE cases and 7616 controls were included as summary statistics from an Immunochip study of East Asians22 (Supplementary Table 2).

Ancestral correlation of SLE

Genotype imputation and association analysis were performed independently for each GWAS cohort (Supplementary Figs. 34) and as meta-analyses of each ancestral group (Fig. 1; see “Methods” section). The trans-ancestral genetic-effect correlation, rge, between the Chinese and European GWAS was estimated to be 0.64 with a 95% confidence interval (CI) of 0.46 to 0.81 by Popcorn23 (see “Methods” section), indicating a significant, but incomplete, correlation of the genetic factors for SLE between the two ancestries. This analysis was repeated by removing variants in the human leukocyte antigen (HLA) region (chr6: 25–35 mbp), and the rge increased to 0.78 as a result, suggesting greater ancestral differences for the HLA region.

Fig. 1: Manhattan plots for association results of systemic lupus erythematosus (SLE) in Chinese and European populations.
figure 1

The Chinese SLE GWAS comprised 4222 cases and 8431 controls and the European GWAS comprised 4576 cases and 8039 controls. The X-axis is the P-value of association (logistic regression; additive model; two-sided test), as −log10 (P), for the meta-analyses of the Chinese (right) and European (left) ancestries. Red dashed lines indicate the threshold of genome-wide statistical significance (P = 5E − 08). SNPs with P < 1E − 40 in an associated locus are not shown from the plot.

Novel SLE susceptibility loci

Meta-analyses, involving a total of 35,369 participants (see “Methods” section; Supplementary Table 2) were conducted. Of the 94 previously reported SLE associated variants (Supplementary Data 1), 59 (62.8%) surpassed a genome-wide significance P-value threshold (5.0E−08) and 84 (89.4%) exceeded the threshold of 5E−05 in our study. Thirty-four novel variants reached genome-wide significance and four variants had P-values approaching this threshold based on either ancestry-dependent or trans-ancestral meta-analyses. The newly identified loci included the immune checkpoint receptor CTLA4, the TNF receptor-associated factor TRAF3 and the type I interferon gene cluster on 9p21 (Table 1 and Supplementary Data 2). The new loci bring the total of SLE-associated loci to 132 and produce a 23.5% and 16.5% increase in the proportion of heritability explained for East Asians and Europeans, respectively (see “Methods” section).

Table 1 Summary association statistics of newly identified SLE-associated variants.

Annotation of SLE susceptibility loci

Functional annotations that might be enriched with SLE susceptibility loci were evaluated by the stratified LD score regression method24 (see “Methods” section). For non-cell type-specific annotations, heritability was significantly enriched in transcription start sites (P = 2.47E−05), regions that are conserved in mammals (P = 8.22E−03) and ubiquitous enhancers that are marked with H3K27ac or H3K4me1 modifications (P = 4.43E−02, P = 5.03E−02, respectively; Supplementary Fig. 5). Based on H3K4me1 modifications (associated with active enhancers) across 127 cell types, enrichment of specific cell types was investigated (see “Methods” section). Cells that surpassed the false discovery threshold rate (FDR < 0.05) were mostly hematological cells, with B and T lymphocytes the most prominent cell types associated with SLE (Supplementary Fig. 6a). Similar results were observed based on H3K4me3 modifications (associated with promoters of active genes) (Supplementary Fig. 6b).

We used the Regulatory Element Locus Intersection (RELI) method25 to identify transcription factors (TFs) whose binding sites are enriched in SLE-associated loci. Out of 1544 ChIP-seq datasets with a total of 344 TFs in 221 cell types, 249 datasets showed significant enrichment with the associated loci (corrected P < 1.00E−05; see “Methods” section; Supplementary Data 3). Consistent with results from previous studies25,26, the associated SNPs were strongly intersected with binding sites of immune-related TFs, including NFATC1, NF-κB, STAT5A, IRF4, and viral protein EBNA2.

Identification of putative disease genes and pathways

Excluding the HLA region, 179 putative disease genes were identified across the disease-associated loci reported before and those newly identified in this study (Supplementary Data 4; see “Methods” section). A significant level of protein-protein connectivity corresponding to genes found at the novel loci and known SLE-associated loci was observed (P < 1E−16; Supplementary Fig. 7; see “Methods” section). Forty-five pathways were significantly enriched with these putative SLE susceptibility genes (ToppGene27, FDR < 0.05; Supplementary Table 3). The pathways of cytokine signaling, IFN-α/β signaling, Toll-like receptor (TLR) signaling, and B and T cell receptor signaling showed greatest enrichment. The RIG-I-like receptor signaling (P = 5.83E−10) and TRAF6-mediated IRF7 activation (P = 6.41E−10) pathways were designated as SLE associated pathways primarily based on genes newly identified in this study.

Trans-ancestral fine-mapping of disease-associated loci

One hundred and eight SLE-associated loci tagged by SNPs having a minor allele frequency (MAF) greater than 1% in both Chinese and European populations were examined by PAINTOR28, making use of the differences in LD between ancestries. The median number of putative causal variants in the 95% credible sets reduced from 57 per locus when using only the European GWAS to 16 per locus when using data from both ancestries (one-sided paired t-test P = 9.79E−07, Fig. 2). The number of disease-associated loci with five or fewer putative causal variants increased from four when using the European GWAS alone to 15 (Supplementary Data 5). A single putative causal variant was identified for the WDFY4 and TNFSF4 loci, the latter of which was functionally validated in a previous study29 (Supplementary Fig. 8).

Fig. 2: Fine-mapping across 108 SLE-associated loci based on the association results from the Chinese SLE GWAS, the European GWAS, and the trans-ancestral meta-analyses.
figure 2

The Y-axis indicates the number of potential causal variants at each locus based on the 95% credible sets of the association results from the Chinese SLE GWAS (green), the European GWAS (orange) and trans-ancestral meta-analyses (purple). The upper and lower bounds of the boxes represent the first and the third quartiles, respectively, and the central lines indicate the median. The two lines outside the box extend to the highest and lowest observations. N = 108 independent loci associated with SLE for each category.

Ancestral group differences

Based on the analysis of Cochran’s Q (CQ)-test that assesses heterogeneity of effect-size estimates from different ancestral groups, SLE-associated variants or loci outside of the HLA region were divided into four categories: (1) ancestry-shared disease loci tagged by variants with CQ-test P ≥ 0.05; (2) putative ancestry-heterogeneous disease loci with CQ-tests of P< 0.05 but FDR adjusted CQ-test P ≥ 0.05; (3) ancestry-heterogeneous disease loci with FDR adjusted CQ-test P< 0.05; and (4) disease loci tagged by associated variants with the risk allele absent in one of the two ancestries15 (Supplementary Table 4). Nine disease variants, other than those absent or rare (MAF < 0.01) in one of the two ancestries15, showed significant differences in effect-size estimates between the two ancestral groups and were considered ancestry-heterogeneous (FDR adjusted CQ-test P < 0.05, category 3; Fig. 3a and Supplementary Table 5). Within this category, variants in the HIP1, TNFRSF13B, PRKCB, PRRX1, DSE, and PLD4 loci were associated with SLE only in East Asians and variants in TYK2 and NEURL4-ACAP1 only in Europeans (P < 5.0E−08 in one ancestry but P > 0.01 in the other, with non-overlapping of the 95% CIs of the ORs). These eight loci were thus considered ancestry-specific. SNP rs4917014, a variant near IKZF1, showed a significantly stronger effect in East Asians (OR = 1.33, P = 5.18E−29) than in Europeans (OR = 1.16, P = 1.34E−06; CQ-test P = 4.02E−04). These findings were supported by analyses in each cohort (Supplementary Figs. 910).

Fig. 3: Genetic loci showing significant ancestral differences in effect-size estimates for systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA).
figure 3

a Correlation of effect-size estimates for SLE between East Asian (X-axis) and European (Y-axis) populations (r = 0.58, two-sided P-value = 6.52E−11). Disease-associated variants with FDR adjusted CQ-test P-value <0.05 (category 3) are labeled in red, and the variants with CQ-test P-value < 0.05 but FDR adjusted CQ-test P-value ≥0.05 (category 2) are labeled in blue. b, c Forest plots of association from each cohort at the TYK2, PRKCB and PLD4 loci. Diamonds represent the combined estimates of odds ratio (OR) in East Asians (EAS, red) and Europeans (EUR, green) for association with SLE. Standard error bars of OR represent 95% confidence intervals of the estimates. Squares represent OR estimates for rheumatoid arthritis (RA) in EAS (red) and EUR (green) populations. Regional plots for each locus are available in Supplementary Fig. 10. HK, Hong Kong; GZ, Guangzhou; CC, Central China; KR, Korean; BJ, Beijing; MC, Malaysian Chinese; SP, Spanish.

On reanalyzing data from association studies on rheumatoid arthritis (RA)30, the variants in the PRKCB and PLD4 loci were found to be associated with RA in East Asians but not in Europeans (CQ-test P = 0.004 and 0.010 for PRKCB and PLD4, respectively), while the variant in TYK2 was found associated in Europeans only (CQ-test P = 0.002; Fig. 3b and c). This consistency with the differences found in the SLE study suggests that shared mechanisms could be responsible for ancestral group differences among autoimmune diseases.

Colocalization of loci across ancestral groups

Colocalization methods consider many SNPs, rather than only the leading variant in a locus, to compare association signals between ancestral groups. The ancestry-shared disease loci showed much higher posterior probabilities (PP) of colocalization (mean of PP = 0.42) than the ancestry-specific loci (mean of PP = 0.03; Supplementary Fig 11a). For example, the MTF1, IKBKE and TNIP1 loci in category 1 showed strong posterior probabilities (≥90%) of colocalization under a Bayesian test31 (see “Methods” section), suggesting shared causal effects between the two ancestries. The eight ancestry-specific loci in category 3 showed low posterior probabilities of colocalization (1–10%), consistent with the CQ-test of the top variant of each locus (above and Supplementary Fig. 10).

Since LD differences between ancestries may affect the colocalization results, we compared the SLE association signals from the Chinese populations with those on 27 non-immune-related phenotypes studied in Europeans (Supplementary Table 6) to serve as colocalization baseline values. While posterior probabilities for colocalization at the ancestry-shared MTF1, IKBKE and TNIP1 loci were much greater than the baseline values, there were no differences for the six Asian-specific disease loci (Supplementary Fig. 11b), thereby excluding the potential influence of LD. The European-specific loci (TYK2 and NEURL4-ACAP1) were not evaluated this way due to lack of public data.

Functional annotation of the ancestry-heterogeneous loci

The ancestry-heterogeneous loci appear to be enriched for functions related to antibody production. Two of the nine putative disease genes at ancestry-heterogeneous loci (category 3), TNFRSF13B and IKZF1, are causal genes for human primary immunodeficiency disorders (PID)32 presenting with primary antibody deficiencies (PADs), whereas none of the disease genes at the putative ancestry-heterogeneous loci (category 2, 0/22; Fisher exact test P = 0.07) or the ancestry-shared loci (category 1, 0/120; Fisher exact test P = 0.004) are known to cause PADs in humans. Two of the East Asian-specific disease variants, those in the TNFRSF13B and PRKCB loci, were associated with serum immunoglobulin levels in East Asian populations33,34,35, whereas none of the variants in loci belonging to category 1 and 2 were found to be associated.

TNFRSF13B, which encodes a BAFF receptor, TACI, plays a major role for immunoglobulin production36,37,38. In this study, a missense variant in TNFRSF13B, rs34562254, was specifically associated with SLE in Chinese populations (OR = 1.18, P = 2.88E−08 in Chinese; OR = 1.01, P = 0.75 in Europeans). In European populations, an SLE-associated variant in the 3’-UTR of TNFSF13B (encoding BAFF), which is absent in Asian populations (category 4), was associated with serum levels of total IgG, IgG1, IgA, and IgM39.

Mice deficient in the orthologs of four of the nine putative disease genes (44.4%) at the ancestry-heterogeneous loci for SLE, Tnfrsf13b-, Ikzf1-, Prkcb-, and Tyk2-demonstrated abnormal IgG levels40 (MP:0020174), while at putative ancestry-heterogeneous loci (4/22 or 18.2%; OR = 3.43, Fisher exact test P = 0.185) or ancestry-shared loci (14/120 or 11.6%; OR = 5.92, Fisher exact test P = 0.027), proportionately fewer genes caused aberrant IgG levels in mice. Orthologs of four of the twelve (33.3%) putative disease genes from disease loci where the risk allele is monomorphic in one of the ancestries (PTPN22, TNFSF13B, IKZF3, and IGHG1; category 4) also demonstrated abnormal immunoglobulin production in gene knockout mouse models36,39,41,42,43.

Evolutionary signatures for the disease loci

Disease loci with heterogeneity between East Asians and Europeans might have undergone differential selection pressures in recent human history, as has been shown for the SLE risk variant in TNFSF13B39. Frequency variances, as fixation indexes (Fst), for the variants of the first three categories were calculated using 3324 controls from the HK cohort and 5379 controls from EUR GWAS 2 cohort (see “Methods” section). Higher Fst would indicate a larger frequency difference between the two ancestries. Mean Fst values for the ancestry-shared, putative ancestry-heterogeneous and ancestry-heterogeneous variants were 0.054, 0.061, and 0.084, respectively. Although a small sample, three (DSE, HIP1, TNFRSF13B) of the nine ancestry-heterogeneous variants (33.3%) showed Fst ≥ 0.15 (empirical P < 0.03), while only 10% of the putative ancestry-heterogeneous variants and 8.8% for the ancestry-shared disease variants had Fst ≥ 0.15 (Supplementary Fig. 12a).

In addition, recent positive selection, measured by standardized integrated Haplotype Scores (iHS)44, was investigated at the associated loci. A significant correlation of the iHS scores, estimated using control subjects from HK and EUR GWAS 2 cohorts (see “Methods” section), supports recent positive selection for the shared associated variants (categories 1; r = 0.28, P = 0.03; Supplementary Fig. 12b). This is consistent with results using data from Southern Han Chinese (CHS) and Utah residents of European ancestry45 (CEU; r = 0.32, P = 0.008). However, there was no evidence of such a correlation for disease variants that showed ancestry heterogeneity (category 2 and 3; Supplementary Fig. 12b). For example, in the BAFF system, the derived risk allele rs34562254-A in TNFRSF13B is much more prevalent in East Asians than in other populations (Fig. 4a) and has a significantly longer haplotype for the derived risk allele than the ancestral allele (more negative standardized iHS score) in East Asian populations than in Africans (P = 3.2E−04) or Europeans (P = 4.4E−04) (Fig. 4b), suggesting recent positive selection for the risk allele in East Asians.

Fig. 4: Risk allele frequency and standardized integrated haplotype scores (iHS) across different populations for the Asian-specific variant at TNFRSF13B locus.
figure 4

a Frequency for the risk allele rs34562254-A across populations. b Standardized iHS for the variants across different continental populations (EAS: East Asians; AFR: Africans; EUR: Europeans). The risk allele rs34562254-A is a derived allele, and negative iHS value indicates that the haplotypes carrying the derived allele are longer than the haplotypes carrying the ancestral allele. The frequency and iHS were calculated using data from the 1000 Genomes Project.

Polygenetic risk scores for SLE and their accuracies across ancestries

Polygenic risk scores (PRS) have been used to estimate individual risk to complex diseases, such as coronary artery disease46 and schizophrenia47. However, as the majority of GWAS findings used to calculate these scores are based on European populations, their accuracy in other populations may be limited. PRS for SLE, trained by data on European populations, were tested on individuals of three Chinese cohorts using the lassosum algorithm48 (see “Methods” section). The area under the receiver-operator curve (AUC) ranged from 0.62 to 0.64 for the three Chinese cohorts. Similar results were observed in the reverse case (Supplementary Fig. 13a). The LDpred49 algorithm produced similar results (Supplementary Fig. 13b). These analyses suggest a partial transferability of PRS between the two ancestries.

Using samples from the GZ cohort as the validation dataset, performance of predictors trained using GWAS summary statistics from the HK and CC cohorts (2618 cases and 7446 controls) or from the European cohorts (4,576 cases and 8039 controls) were evaluated. Ancestry-matched predictors significantly outperformed (AUC = 0.76, 95% CI: 0.74–0.78) ancestry mismatched predictors (AUC = 0.62, 95%CI: 0.60–0.64) (Fig. 5a). When the analysis was repeated by randomly choosing the same number of samples (1500 cases and 1500 controls) from each of the Chinese and European GWAS as training data, a similar difference was observed (Supplementary Fig. 14). Ancestry-matched PRS for samples in the GZ cohort had a mean difference of 0.89 (standard deviation) between the SLE case and control groups (t-test P = 9.01E−116) and disease classification using the optimal threshold achieved 73.4% sensitivity and 65.4% specificity (Fig. 5b; see “Methods” section). Disease risk increased with higher PRS, with individuals in the highest PRS decile having a much higher disease risk than those in the lowest decile (OR = 30.3, Chi-square test P = 6.23E−54; Fig. 5c).

Fig. 5: Performance of polygenic risk scores (PRS) calculated by summary statistics from different ancestral groups.
figure 5

a Performances of PRS are indicated by area under receiver operating characteristic curve (AUC). PRS for individuals from the Guangzhou (GZ) cohort were calculated using summary statistics from ancestry-matched Chinese populations (2618 cases and 7446 controls; red) and European populations (4576 cases and 8039 controls; blue). b Distribution of PRS for SLE cases (blue) and controls (pink) from the GZ cohort. The PRS distribution was estimated using summary statistics from the ancestry-matched Chinese cohorts (upper panel) or from the mismatched European GWAS (lower panel). The optimal threshold for disease risk prediction is indicated by the red vertical line. c Odds ratios (ORs) of disease risk across different PRS groups in the GZ GWAS. The samples from GZ are equally divided into ten groups (n = 259 independent samples in each group) based on PRS estimated from ancestry-matched data. The 1st decile represents the lowest PRS group while 10th decile refers to the group of samples with the highest PRS. ORs and 95% confidence intervals (bars) for each group were calculated by reference to the 1st decile group.

Discussion

As non-European groups appear to be more severely affected by SLE, greater genetic data on these groups are likely to be highly informative. By increasing the number of subjects of East-Asian ancestry to levels roughly equivalent to those of European subjects, and including previously published data, we have made cross ancestral group studies possible.

Through ancestry-dependent and trans-ancestral meta-analyses, we identified 38 novel loci associated with SLE, bringing the total number of SLE-associated loci to 132. High level functional annotation of these SLE associated loci implicated hematological cells, particularly B and T lymphocytes, cytokine signaling and other immune system pathways. Consistent with previous findings50, we demonstrated the value of trans-ancestral data in significantly reducing the number of putative causal variants at each disease-associated locus, which may facilitate future functional and mechanistic studies.

There was strong evidence of heterogeneity on SLE associations between the two ancestries, which were not likely to be artefacts of study power. Eight variants (that are common in both ancestral groups) were associated with disease in only one of the ancestral groups. For three of these, a similar ancestral difference was found by re-analyzing association data on RA30. This might suggest that common mechanisms account for ancestral differences in autoimmune diseases.

Genes at the ancestry-heterogeneous disease loci seemed more likely to be involved in regulation of immunoglobulins than genes at the ancestry-shared disease loci. Immunoglobulin levels are highly heritable51 and have been found to differ between ancestries, with African Americans and Asians having higher serum immunoglobulin levels than people of European ancestry52,53,54,55. Higher antibody levels in non-European populations might have contributed to their higher prevalence of SLE and further study of intrinsic differences in immune function among ancestries may be informative.

That differential mechanisms may exist for antibody regulation between East Asian and European populations is supported by the association of SLE with the BAFF signaling system. This system, which is part of the initial reaction to host-pathogen interactions37, might be under positive selection due to different environmental exposures37,39. BAFF (TNFSF13B) and its receptors, one of which is TACI (TNFRSF13B), play essential roles in B cell survival and differentiation36,37,38. The SLE risk allele in the gene encoding BAFF is completely absent in Chinese populations39 and a missense variant in the gene encoding TACI (TNFRSF13B) was found to be specifically associated with SLE in East Asians in this study. Both these genes, TNFSF13B39 in Europeans and TNFRSF13B in East Asians, were found to have undergone positive selection in recent human history. Adaptation of the host to resist pathogens may underlie some of the ancestral group-heterogeneity.

TACI is expressed at very low levels in human newborns and mice before exposure to pathogens56,57 and previous studies have shown that certain pathogens can ablate B cell responses by modulating the expression of TACI58,59,60. The BAFF risk allele was shown to significantly upregulate humoral immunity39, and whether this is the case for the risk allele in TACI should be investigated. TACI blockers, such as Atacicept61 and Telitacicept62, might give better responses to SLE in patients of Asian ancestry and recent results of a Phase 2b study showed that the Telitacicept was efficacious for SLE patients in China62. In addition, the variant found in TNFRSF13B may be a useful genetic marker for the prescription of Belimumab and TACI blockers, a hypothesis that may warrant further study.

In addition to gene-environment interactions, gene-gene interactions could be another reason for the population difference. However, such effect requires much bigger power to be detected. Recent study showed that the penetrance of monogenic risk mutations could be dependent on polygenic background63, indicating a complex format of gene-gene interactions. Besides, ancestry-dependent tagging of untyped causal variants due to different LD structure may result in artefacts of population-different associations. This is an area that warrants further investigation, which demands both increase of study power and innovative methodologies.

Our analyses have identified a substantial number of novel SLE-associated genetic loci and deepened our understanding of the genetic factors that may underly the differences in the manifestation of SLE between peoples of European and non-European ancestry. Like a recent PRS study in SLE64, but to a greater extent, we have shown that PRS achieved a far better performance when based on ancestry-matched populations. Our findings contribute new insights into precise treatments, and to risk prediction and prevention of SLE.

Methods

Overview of samples

8252 subjects of Han Chinese descent from Hong Kong (HK), Guangzhou (GZ) and Central China (CC) were genotyped in this study. The institutional review boards of the institutes collecting the samples (The University of Hong Kong, Hospital Authority Hong Kong West Cluster and Guangzhou Women and Children’s Medical Center) approved the study and all subjects gave informed consent. These subjects were genotyped by the Infinium OmniZhongHua-8, the Infinium Global Screening Array-24 v2.0 (GSA) and the Infinium Asian Screening Array-24 v1.0 (ASA) platforms. Illumina GenomeStudio 2.0 was used to perform genotyping for individuals and transformed the results into PLINK format. Fourteen samples were randomly selected and genotyped by different platforms. High concordance rates (>99.9%) were observed for genotypes derived from the different platforms (Supplementary Table 1). Principal component (PC) analysis was performed to examine potential batch effects and no significant differences were observed from the PCs for data genotyped by different platforms and from different batches (Supplementary Fig. 1). Compared to our previous SLE GWAS13, 2042 more samples (992 cases and 1050 controls) were added to the HK cohort, and 2917 additional controls were combined with the CC cohort (named AH in the previous study13) after quality control procedure. The samples in the GZ cohort were newly recruited and genotyped in this study.

For the European data13, we split the samples into three cohorts to better control for population substructures (see the section below). Summary statistics for the GWAS from Spain21 (SP) and three ImmunoChip data sets from Korea (KR), Han Chinese in Beijing (BJ) and Malaysian Chinese (MC)22 were included in our analyses. In total, 11,283 SLE cases and 24,086 controls were involved in this study, and the sample size for each cohort was summarized in Supplementary Table 2.

Quality control and association study

Genotype Harmonizer65 was used to align the strands of variants of the Chinese GWAS to the reference of the 1000 Genomes Project Phase 3 panel. Variants with a low call rate (<90%), low minor allele frequency (<0.5%) and violation of Hardy–Weinberg equilibrium (P-value < 1E−04) were removed. Quality control required the following criteria: (i) missing genotypes were below 5%, (ii) hidden relatedness (identity-by-descent) with other samples was ≤12.5% (iii) inbreeding coefficients with other samples ranged from −0.05 to 0.05, and (iv) not having extreme PC values as computed for individuals using EIGENSTRAT embedded in PLINK66,67. After quality control, pre-phasing used SHAPEIT68 and individual-level genotype data were imputed to the density of the 1000 Genomes Project Phase 3 reference using IMPUTE269. We compared allele frequencies of all the variants after imputation for the same control groups genotyped by different platforms at different time points, and 190,618 variants were removed from further analyses due to significant differences (P-value < 5E−05). For association analysis SNPTEST70 was used to fit an additive model. Top PCs and the BeadChip types were included as covariates. The number of PCs to be adjusted for in each analysis was determined using a scree plot with a cutoff when the plot levels off. Variants with imputed INFO scores <0.7 were excluded. The genomic inflation factors (λGC) for the HK, GZ, and CC GWAS were 1.04, 1.03, and 1.04, respectively, and the LD score regression (LDSC)71 intercepts were 1.03, 1.02, and 1.03, respectively. Manhattan plots for each cohort are shown in Supplementary Fig. 3.

For the European SLE GWAS data, the λGC and LDSC intercept listed in LD hub seemed inflated (λGC = 1.17 and LDSC intercept = 1.10)20. Thus, the data were reanalyzed to minimize the potential influence of sub-population stratification (and see below). PCA analysis showed that subjects from the existing European data were more diverse than the Chinese subjects used in this study (Supplementary Fig. 2a). The European individuals were grouped into three cohorts by their PCs relative to the subjects of the 1000 Genomes Project. Subjects in the EUR GWAS 1, EUR GWAS 2, EUR GWAS 3 cohorts shared similar PCs with individuals of Spanish (IBS), northern and western European (CEU and GBR) and Italian (ITS) origins, respectively (Supplementary Fig. 2b and Supplementary Table 2). Quality control, imputation and association analyses were conducted, as for the Chinese datasets, in each cohort. λGC for the three European GWAS datasets were 1.05, 1.08, and 1.03, respectively, and the LDSC intercepts were 1.03, 1.04, and 1.00, respectively. Manhattan plots for each cohort are shown in Supplementary Fig. 4.

Meta-analyses of SLE association studies

Meta-analyses for the Chinese and European SLE GWAS were conducted independently. The summary association statistics from HK, GZ and CC GWAS (4222 SLE cases and 8431 controls) were combined in a meta-analysis using a fixed-effect model, weighted by the inverse-variance72. The λGC for the Chinese meta-analysis was 1.09 and the LDSC intercept was 1.04. For the European data, the EUR GWAS 1–3 datasets were combined with the SP GWAS21 in the meta-analysis (4576 cases and 8039 controls). λGC and the LDSC intercept for the European SLE meta-analysis reduced to 1.11 and 1.03, respectively.

Trans-ancestral meta-analysis across the Chinese and European GWAS cohorts used the fixed-effect model. The summary association statistics for the Immunochip data from KR, BJ, and MC22 were included as an in silico replication. The λGC for the trans-ancestry meta-analysis was 1.15, and the LDSC intercepts computed by using the LD score from either East Asian or European panels were 1.06 and 1.08, respectively.

Genetic correlation between the two ancestries

Trans-ancestral genetic correlation from the meta-analysis results for Chinese (HK, CC, and GZ GWAS) and Europeans (EUR GWAS 1–3 and SP GWAS) were estimated using the Popcorn algorithm23 based on common SNPs in the autosomes. The disease prevalence in Chinese and European populations were set to be 1‰ and 0.3‰, respectively4. SNPs were removed from this analysis according to the following criteria: (1) SNPs with strand-ambiguities (A/T or C/G alleles); (2) having MAF <5%; and (3) having imputed INFO score <0.9. The cross-ancestry LD scores were estimated using control subjects from the HK cohort (n = 3324) and EUR GWAS 2 cohort (n = 5379).

Heritability explained by the SLE-associated variants

The variance in liability explained by the SLE-associated variants was measured using VarExplained program73. Variants in the HLA region were excluded in the analysis. The disease prevalence was set to be 1‰ for East Asians and 0.3‰ for Europeans4. The novel loci increased the heritability explained from 0.10 to 0.13 for East Asians, and from 0.08 to 0.09 for Europeans.

Functional annotations of SLE associated SNPs

The stratified LD score regression method24 was applied on the trans-ancestral meta-analysis result to partition SNP-heritability across functional annotations. Twenty-eight categories of annotations that are not cell type specific (Supplementary Fig. 5) provided by this source were studied. For cell type-specific analyses, H3K4me1 and H3K4me3 modifications across 127 cell types (Supplementary Fig. 6) were downloaded from the Roadmap Epigenomics Project74. The cell type-specific enrichment was performed under the “full baseline” model24, which aimed to control for overlaps with annotations that are not cell type-specific. The RELI25 analyses were performed to identified TFs whose binding sites are enriched in the disease-associated loci. All SLE-associated SNPs and variants that are in high LD with them (r2 > 0.8) were taken as input. All the 1544 ChIP-seq datasets curated in this tool were tested in this study. The significance level and relative risk for each dataset were computed by comparing the observed intersections with expected intersections obtained from 2000 simulations.

Identification of putative SLE genes and gene-set enrichment analysis

Putative causal gene(s) across all the SLE-associated loci outside of the HLA region were identified using DEPICT75. The default setting (r2 > 0.3) was used to set boundaries for each SLE associated locus. Genes within (or overlapping) the boundaries were examined and those with a P-value <0.05 were defined as putative causal genes. If no genes were selected at that locus, gene(s) identified from eQTL data from human whole blood76,77 were considered to be putatively causal. The protein-protein interaction network and the enrichment P-value were constructed and computed by STRING78 (version 11). Gene-set enrichment analysis was performed using ToppGene27, with the August, 2019 versions of the KEGG79, Reactome80 and mouse knockout phenotype40 databases. The 2017 IUIS Phenotypic Classification for Primary Immunodeficiencies32 (PID) was used to obtain 320 human PID genes categorized into nine phenotypic classifications.

Trans-ancestral fine-mapping of the associated loci

The HLA region was excluded from this analysis, as extensive LD and limited genotyping of SNPs in both ancestries makes defining the best model of association difficult for this region. Disease loci with rare risk alleles (MAF < 0.01) or absent in one ancestry were also excluded, leaving 108 SLE-associated loci in the autosomes for this study. For each disease locus, all variants within the region were extracted for both ancestries. The genetic interval was determined by the closest recombination hotspots around a given disease-associated variant (defined as a recombination rate <10 cM/Mb). A fine-mapping algorithm, PAINTOR (version 3.0)28, was used to estimate the posterior probability of causality for each variant at a given locus based on the trans-ancestral model. For comparison, we also applied the fine-mapping algorithm on the Chinese and European SLE GWAS, separately. All analyses were run under the assumption of a single causal variant per locus, and conditional analysis was performed if multiple signals were present within a locus. The LD matrix was calculated using control samples from HK (n = 3324) and EUR GWAS 2 (n = 5379) for Chinese and European populations. Variants with a cumulative posterior probability greater than 95% were defined as putative causal variants (95% credible set).

Identification of loci with differential effects between the two ancestries

Cochran’s Q (CQ)-test81 was used to examine effect-size differences between the two ancestries for all the disease-associated variants in the autosomes. If the variants were also interrogated by the Immunochip22 system, the association results derived from the KR, BJ, and MC cohorts were also included. CQ-test P-values were adjusted for a cutoff of 0.05 using the Benjamini–Hochberg method82. For comparison, summary association statistics on RA were downloaded from a previous study30 of 4873 RA cases and 17,642 controls of Asian ancestry and 14,361 RA cases and 43,923 controls of European ancestry.

Colocalization analysis

Colocalization of association signals from the two ancestries was determined using the R package coloc31 on all variants with a MAF >1% and imputation (IMPUTE269) INFO score >0.9 within a given disease locus. Posterior probabilities (PP) for five different configurations were evaluated at the associated loci: PP0, no association in either group; PP1, association with SLE in East Asians but not in Europeans; PP2, association with SLE in Europeans but not in East Asians; PP3, association with SLE in both ancestries but by two independent signals; PP4, association with SLE in both East Asians and Europeans by the same signal. The average PP for the five configurations were 0.24, 0.14, 0.12, 0.08, and 0.42 for the ancestry-shared loci, and 0.04, 0.67, 0.24, 0.02, and 0.03 for the ancestry-specific loci.

To control for LD differences between ancestries, SLE association signals from Chinese populations were compared with those from 27 immune-unrelated phenotypes from European populations (LD hub20; Supplementary Table 6) to generate baseline posterior probabilities of colocalization in the absence of a phenotypic relationship. Ancestry-shared causal effects for SLE were expected to be significantly greater than the baseline values.

Analysis of selection signatures for the associated variants

The fixation index (Fst) was used to test allele-frequency differences between the two ancestries. Fst was calculated based on the following formula83:

$$F_{\mathrm{st}} = \frac{{H_t - H_s}}{{H_t}},$$
(1)

where Ht is the expected proportion of heterozygosity in the pooled samples from all ethnicities based on Hardy–Weinberg equilibrium: \(H_t = 2\bar p\left( {1 - \bar p} \right)\), \(\bar p\) is the allele frequency in the overall pool. Hs, the expected proportion of heterozygosity in a subpopulation (either Chinese or Europeans), is estimated as

$$H_s = \frac{{H_{p1} \times N_{p1} + H_{p2} \times N_{p2}}}{{N_{p1} + N_{p2}}},$$
(2)

where Hpi is the expected heterozygosity in the ith subpopulation estimated by the allele frequency in that subpopulation under Hardy-Weinberg equilibrium. Npi is the sample size of the ith subpopulation.

Potential selective sweeps in respective ancestries were examined using the Integrated Haplotype Score (iHS) method, which measures the extended haplotype homozygosity for the ancestral allele relative to the derived allele84. Raw iHS scores were computed using the R package rehh85 and normalized by different frequency bins (50 bins over the range 0 to 1). Large negative standardized iHS values indicate long haplotypes carrying the derived allele, while large positive values suggest long haplotypes with the ancestral allele. The Fst and iHS values analyzed in this study were estimated using control subjects from HK (n = 3324) and EUR GWAS 2 (n = 5379). Standardized iHS scores based on the 1000 Genomes Project were downloaded from a previous study45 for comparison.

Calculation of polygenic risk scores

Polygenic risk scores (PRS) for individuals were computed using lassosum48, a penalized regression framework. The meta-analysis results on Europeans were used to calculate PRS for individuals of Chinese ancestry, and vice versa. LD information among SNPs was calculated from the testing dataset. These analyses were repeated using LDpred49.

The GZ SLE GWAS cohort was used as a test dataset to evaluate the influence of training data from different ancestries. Two predictors were constructed using lassosum based on meta-analysis results from: (1) HK and CC GWAS, 2618 cases and 7446 controls; (2) European GWAS, 4576 cases and 8039 controls. To control for influence from different sample sizes, 1500 cases and 1500 controls were randomly chosen from the Chinese and European populations to train two same-size predictors (repeated 3 times). PRS values generated from each test were scaled to a mean of 0 and a standard deviation of 1, and then evaluated based on the area under the ROC curve (AUC). The values and the 95% confidence intervals were calculated using the R package pROC86 and the optimal cut-off, the point that maximizes the sum of sensitivity and specificity, for case-control classification was estimated using the coords function.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.