Introduction

Systemic lupus erythematosus (SLE) is a complex autoimmune disease of unclear aetiology with diverse clinical features. Its prevalence is estimated to be about one in 2000 people in some population.1 There is a strong gender bias, with a female:male ratio of about 9:1 seen between the ages of 15 and 50 years, and ethnicity influences this disease.1, 2, 3 A feature of SLE is the generation of a variety of autoantibodies to cell surface and nuclear antigens. These autoantibodies contribute to end-organ damage via a variety of mechanisms and the formation of immune complexes can result in glomerulonephritis, arthritis, rashes, serositis and vasculitis.

SLE appears to develop through the interaction of largely unknown environment factors and the genetic composition of the individual. Evidence for the genetic component is strongly illustrated by concordance rates in monozygotic twins ranging between 25 and 69%, while the rate in dizygotic twins is only 1–2%.4 Family-based studies show familial recurrence risks of siblings of probands in the population as a whole (λs) is 20 for SLE. Linkage analysis of SLE in ethnically diverse families has identified at least 20 potential susceptibility loci.5, 6, 7, 8, 9

The 1p36 region has been linked to SLE in four investigations5, 6, 7, 10 and is orthologous to the murine susceptibility interval on chromosome 4, which harbours Nba1 and Lbw2 intervals from the NZB/WF1 model.11 Several immunologically relevant genes have been mapped within this interval and include among them the tumour necrosis factor receptor (TNFR) superfamily of genes. Members of this superfamily are expressed on diverse cells of the immune system and play a central role in immune regulation and in apoptosis. We selected TNFRSF14 (HVEM), TNFRSF8 (CD30) and TNFRSF1B (CD120B, TNFR2), as good candidate genes for SLE for study within the 1p36 interval based on position and function.

HVEM is broadly expressed on cells of the immune system and is involved in T-cell activation. Three ligands have been identified for HVEM: the HSV glycoprotein D, and two members of the TNF family, lymphotoxin α3 and the newly described member LIGHT (TL4), which is produced by activated T cells.12 CD30 is a receptor for CD30 ligand and is expressed on activated but not on resting T and B cells. In T cells, CD30 signalling stimulates proliferation, cytokine production and induces susceptibility to apoptosis. CD30 signalling protects against autoimmunity by controlling the expansion of autoreactive CD8 effector T cells.13 CD120B is expressed predominantly in cells of stimulated T- and B-lymphocytes and, together with CD120A, constitute the major receptors for tumour necrosis factor alpha. In SLE patients, soluble CD120B has been found to correlate with disease activity.14

There have been numerous studies investigating linkage and association of the CD120B with autoimmunity in humans. A single-nucleotide polymorphism (SNP) in exon 6 (M196R) has been associated with susceptibility to SLE.15, 16 CD30 has been implicated for type I diabetes in NOD mice17 and it has been shown that anti-CD30L monoclonal antibody limits the development of spontaneous diabetes in NOD mice.18 Since CD30 neighbours CD120B and there are no other known genes between them, we employed a linkage disequilibrium (LD) and haplotype-based approach to study these two genes for linkage and association. This study has defined a high-resolution haplotype structure for HVEM, CD30 and CD120B, and examined the association of the individual SNPs and haplotypes with SLE in European-Caucasians from the UK.

Materials and methods

Family collection and DNA

A large collection of SLE nuclear families has been obtained from the UK with predominantly one affected offspring per family. The demographic details of the families studied are summarised in Supplementary Table 6. The families were randomly allocated into cohorts. Cohort 1 was used as an initial screen and cohort 2 was utilised to follow up on SNPs showing hints of association (P<0.05) from the initial screen. Samples from both parents were available for 65% of the cases, and siblings were also collected where available. In the case of the single parent families, samples were always taken from siblings. All participants signed informed consents prior to blood and data collection and study protocols were approved by the London multicentre research ethics committee. The clinical manifestations of SLE are variable and diagnosis criteria have been established by the American College of Rheumatology (ACR).19, 20 Patients fulfilling four of the 11 criteria are considered to have SLE. Consequently, SLE-affected individuals have different disease phenotypes. Patients were classified as having renal lupus using the ACR criteria. Further clinical information was obtained from individuals by interview and completion of a health questionnaire.21 All patients and siblings sera were screened for anti-cardiolipin IgG and were scored as positive if results were >12 arbitrary ELISA units.22 Thrombosis includes self-reported venous thrombosis, either deep vein thrombosis and/or pulmonary embolism. DNA samples were prepared from 40 ml of blood by phenol–chloroform extraction.23 DNA concentration was obtained by PicoGreen® quantification. Clinical and genotyping data were stored on a specialised customised database (BioComputing Oy) in a coded anonymised manner.

SNP selection

In silico SNP hunting was employed using published genomic and cDNA sequence data. The largest public databases of SNPs, the SNP consortium (http://snp.cshl.org/), GeneSNPs (http://www.genome.utah.edu/genesnps/), SNPper (http://bio.chip.org:8080/bio), dbSNP (http://ncbi.nlm.nih.gov/SNP), and HGBASE (http://hgbase.interactiva.de/) were explored for SNPs. These resources were constantly changing and were scrutinised on multiple occasions. The majority of these markers in the databases occur outside the coding regions of genes. Initial SNPs were chosen based on coding areas and noncoding SNPs chosen on spacing throughout the genes, validation and submitter information. SNPs that fell in low complexity or repeat regions identified by inputting the sequence area into the program REPEATMASKER (http://ftp.genome.washington.edu/RM/Repeatmasker) were dismissed at this stage. During this study, novel coding SNPs were provided by the SNP Discovery Group at the Whitehead Institute. Reference cluster ID's (rs#) were given for in silico SNPs and Whitehead ID's (G6073# and G551#) were used for novel SNPs. SNPs were removed from further analysis if they did not genotype well or failed completely in assay, were monomorphic, did not fall within Hardy–Weinberg equilibrium or had >5 pedigree errors.

Genotyping

The genotyping methodology employed has been described previously.24, 25 In brief, genotyping was performed by multiplexing SNP assays using the Sequenom® MassARRAY™ genotyping system (Sequenom, Hamburg, Germany). Multiplex assays were designed for three to six SNPs per assay. Primer design for multiplex was carried out using SpectroDESIGNER™1.3.4 and MassARRAY™ Assay Design 2.0 software (Sequenom, Hamburg, Germany).

Statistical analysis

Hardy–Weinberg equilibrium testing was carried out using the exact test courtesy of G Abecasis and J Wigginton (University of Michigan Center for Statistical Genetics). Pedigree checking was performed using PedCheck 1.1.26 All genotyping data was run through to check for Mendelian inconsistencies in pedigree data. The allele frequency thresholds for significant differences between populations were calculated by dividing P=0.05 by the number of singleton SNPs (SNPs not in any LD) multiplied by the number of common haplotypes seen. Two measures of LD, squared correlation coefficient (r2) and Lewontin's standardised disequilibrium coefficient (D′), were computed between pairs of SNPs from founder chromosomes through use of the Haploview 3.0.27 Cohort 1 was used to perform the LD analysis. SNPs below 5% minor allele frequency were not included in LD or haplotype analysis. The 95% confidence interval boundaries for pairs of SNPs were used to estimate recombination.28 Multilocus D′ between haplotype blocks is calculated by computing the 2 × 2 D′ score of each allele at first locus with allele at second locus and then taking a weighted average of these values.29 Haplotype analysis was conducted using the Haploview 3.0. Haplotypes with <1% frequency were not included in the analysis. Haplotypes were estimated using an accelerated expectation–maximisation (EM)-based algorithm, which can deal with a large number of linked loci that have moderate levels of LD. The output of the EM algorithm is the maximum-likelihood estimate and has highly accurate population frequency estimates of phased haplotypes.30

The transmission disequilibrium test (TDT) method evaluates whether the frequency of transmission of alleles from heterozygous parents to their affected child deviates from 50% the expected Mendelian frequency when there is no linkage. TDT analysis was performed on cohort 1, cohort 2 or combined analysis of cohorts 1 and 2. GENEHUNTER (version 3.0) was used for TDT analysis for nuclear simplex families. TRANSMIT (version 2.5.4) was used for TDT analysis of single-parent families. TRANSMIT can deal with transmission of multilocus haplotypes, even if phase is unknown, and parental genotypes may be unknown. Many of the SLE families have only one parent and this programme enables us to obtain more information from our family collection. Data from unaffected siblings may be used to narrow down the range of possible parental genotypes that need to be considered. The pedigree disequilibrium test (PDT) uses data from related nuclear families and discordant sibships from extended pedigrees for TDT analysis.31 The most discordant siblings were selected based on being female and having negative antinuclear antibody scores. One sibling per family was chosen randomly, if >1 discordant siblings were available. The PDTsum version of the statistic was used to compare allele frequencies between affected individuals and their unaffected discordant siblings within families.

Results

We examined SNP markers across the TNFR genes: HVEM, CD30 and CD120B in the 1p36 candidate interval for SLE. HVEM is located 9.4 Mb upstream of CD30 and CD120B lies 23 kb downstream of CD30. HVEM is negatively transcribed, has eight exons and spans 7.5 kb of genomic DNA. CD30 has 15 exons and spans 81 kb of genomic DNA, and CD120B consists of 10 exons and spans 42 kb. Both CD30 and CD120B are positively transcribed.

In an attempt to screen all common haplotypes across HVEM for association, 23 SNPs were selected across the gene and its flanking regions. However, 11 of these 23 SNPs were not analysed and were removed from the analyses (Supplementary Table 1). Several factors necessitated the elimination of these 11 SNPs: these included the SNPs failing quality checks, lack of polymorphism or assay failure (as described in Materials and methods). In all, 12 SNPs were taken through analysis stages for HVEM (Table 1). We aimed for an SNP density of at least one SNP per 5 kb across CD120B and CD30. Coding SNPs were prioritised in selection. In this way, 54 SNPs were chosen for genotyping across CD30 and 58 SNPs were chosen across CD120B. From these, 69 SNPs were discarded across CD30 and CD120B for similar reasons as listed for HVEM (Supplementary Table 1). In total, 25 SNPs were therefore taken through analysis stages for CD30 and 18 SNPs were selected across CD120B (Table 2).

Table 1 HVEM SNP summary on cohort 1
Table 2 CD30 and CD120B SNP summary on cohort 1

The SNPs that genotyped well (more than 75% individuals successfully genotyped) and fell within Hardy–Weinberg equilibrium were genotyped in an initial arbitrary cohort of 180 families, designated cohort 1 (supplementary Table 6). The allele frequencies for the individual SNPs showed significant differences between the different ethnic groups (Tables 1 and 2). For this study, all haplotype association analyses were performed on European-Caucasian samples.

HVEM LD analysis showed strong LD between SNPs 2 and 11, which encompasses the entire gene (Figure 1). The values of D′ were close to 1.0 for most of the SNP pairs across HVEM, indicating very little recombination. Using the 95% confidence-bound SNP pair rule (as described in Methods and materials), one haplotype block was found, which spans 3 kb from the 5′-flanking region to intron 3 of HVEM (Figure 1). Using less stringent block algorithms such as solid spine or four gamete rule, one haplotype block was formed across the gene. The strength of LD within the haplotype block is reflected by the restricted haplotype diversity observed. Three haplotypes are observed in the block, with two high-frequency haplotypes of 48%. The three haplotypes in the block capture 98% of all diversity seen in our population.

Figure 1
figure 1

HVEM results of LD between SNP pairs and haplotype blocks in Caucasians for cohort 1. Values for D′ are presented in each box. Black boxes indicate strong evidence for LD (D′>0.75 with small confidence intervals (CI)) and grey boxes indicate intermediate LD (D′>0.75 with large CI). Horizontal line above the LD diagram representative of chromosome with the location of SNPs indicated. Black rectangle above the chromosome line represents the location of gene on the chromosome. The orientation of the gene in centromeric (Cen) and telomeric (Tel) directions is shown. Below the LD diagram is the haplotype diversity within the haplotype block. Marker numbers are shown across the top. Haplotype tagging SNPs are highlighted with a triangular pointer. Haplotype frequencies are shown next to each haplotype.

The allele frequencies of individual SNPs within CD30 and CD120B were found to vary significantly between the Afro-Caribbean and European-Caucasian or Indo-Asian populations (Table 2). In comparison, fewer SNP allele frequencies varied significantly between Indo-Asians and European-Caucasian populations. The differences seem to be clustered across CD30 and CD120B and there is no consistent pattern across the entire locus, although Afro-Caribbeans have higher allele frequencies on either side of haplotype block 4. LD analysis across CD30 and CD120B showed considerable recombination across the locus (Figure 2). The SNPs genotyped across this region span 241 kb and previous work in European ancestry samples has shown LD to extend up to 60 kb in such populations.32 There were only small blocks of LD across CD30 and CD120B, although CD120B had the largest LD block of 9 kb. Surprisingly, there is an area of strong LD from SNPs 22 to 26 spanning 8 kb between CD30 and CD120B. This block of LD begins 7 kb after CD30 and 9 kb before CD120B.

Figure 2
figure 2

CD30 and CD120B results of LD between SNP pairs and haplotype blocks in Caucasians for cohort 1. Values for D′ are presented in each box. Black boxes indicate strong evidence for LD (D′>0.75 with small confidence intervals (CI)), grey boxes indicate intermediate LD (D′>0.75 with large CI) and white boxes indicate inconclusive LD or evidence for recombination (D′<0.75, r2<0.30). Horizontal line above the LD diagram representative of chromosome with the location of SNPs indicated. Black rectangle above the chromosome line represents the location of gene on the chromosome. The orientation of the gene is illustrated in centromeric (Cen) and telomeric (Tel) directions.

Figure 3 shows the allelic composition of the CD30 and CD120B haplotypes for the depicted block sizes in Figure 2. There are three haplotypes in block 1 with one high-frequency haplotype (71%). Haplotype block 2 falls across introns 8 and 10 of CD30 and contains four haplotypes with one higher-frequency haplotype (52%). Haplotype block 3 comprises of three haplotypes and crosses introns 12 and 13 of CD30. Haplotype block 4 falls between CD30 and CD120B and has four haplotypes with two haplotypes of similar frequency seen (44;45%), which span 8 kb. Haplotype block 5 runs from introns 1 to 8 of CD120B and six haplotypes with one higher-frequency haplotype (34%) are observed. The haplotypes within blocks 1–4 capture 98–100% of all diversity seen in our samples. Of the 94% haplotype diversity seen above 1% frequency is captured in haplotype block 5. Multilocus D′ values between the blocks are <0.30 for haplotype blocks 1–4, indicating very weak LD between blocks and recombination (Figure 3). This can also be seen by the breaking of haplotype structure from each block to the next (Figure 3). There is some LD between blocks 4 and 5 with a multilocus D′ of 0.53, but the splitting of haplotypes into greater haplotypes suggests recombination is relatively more recent.

Figure 3
figure 3

CD30 and CD120B haplotype diversity within haplotype blocks. These were created based on the 95% confidence interval cutoff. Marker numbers are shown across the top. Haplotype tagging SNPs are highlighted with a triangular pointer. Haplotype frequencies are shown to the right of each haplotype. Lines show the most common crossings from one block to the next, with thicker lines showing more common crossings than thinner lines. Shown beneath the crossing lines is multilocus D′, which is a measure of the LD between two blocks. Interblock distances are indicated at the bottom of the diagram.

Single marker TDT analysis using GENEHUNTER for the 12 SNPs across HVEM (Table 3) and the 43 SNPs across CD30 and CD120B (Table 4) was conducted in the SLE families. Association analyses were performed on all of the families in cohort 1. Selected SNPs were also analysed in cohort 2. Taking into account the haplotype data generated from analysing cohort 1, SNPs were selected for typing in the second cohort if they tagged a haplotype or showed any trend for association in the first cohort (P<0.10). Several low-frequency SNPs (eg SNPs 35–37) were also typed in both cohorts if they were coding and altering the primary amino-acid sequence of the gene product. TDT analysis on cohort 2 only is shown in Supplementary Tables 2 and 3. Family-based association studies were conducted in European-Caucasian families separately because of racial differences in haplotype structure. No association was found in Afro-Caribbean or Indo-Asian families, although numbers of these families available were very low. Single-marker TDT provided no evidence for association with any of the SNPs screened with SLE (Tables 3 and 4). Furthermore, TRANSMIT TDT and discordant sib analysis using PDT confirmed the lack of association for the SNPs in HVEM and across CD30 and CD120B with SLE (data not shown). Stratified TDT analyses were carried out on HVEM, CD30 and CD120B SNPs in families with SLE patients selected for renal disease, thrombosis or anti-cardiolipin (aCL-G) antibody production as affected status and there was no evidence of association with these clinical subtypes when data were corrected for multiple testing (data not shown, Supplementary Tables 4 and 5). We selected renal disease as a distinct subset of more severe SLE for analysis. In addition, we chose to analyse patients with thrombosis and aCL-G antibodies, as previous studies had suggested an association of CD120B with vascular disease.33 Haplotype TDT analysis using GENEHUNTER on haplotypes shown in Figures 1 and 2 for HVEM, CD30 and CD120B did not reveal any association of haplotypes with SLE (data not shown).

Table 3 Results of HVEM GENEHUNTER TDT in European-Caucasians
Table 4 Results of CD30 and CD120B GENEHUNTER TDT in European-Caucasians

Discussion

The tumour necrosis factor (TNF) cytokine superfamily plays a central role in immune regulation through a network of ligands and receptors. Linkage studies had identified the 1p36 region as a susceptibility locus for SLE,5, 6, 7, 8, 9 and we selected the TNFR genes HVEM, CD30 and CD120B as good candidates within this interval to examine. Our study aimed to characterise the haplotype structure of these genes and investigate the association of markers with SLE.

The majority of SNPs genotyped in this study were located in introns and untranslated region sequences. Two coding SNPs were successfully genotyped in HVEM (rs2234167, rs2234158). Two synonymous coding SNPs in CD30 (G6073a1, G6073a5) were taken through analysis in European-Caucasians. Four coding sequence SNPs were genotyped in CD120B (G551u1, G551u4, G551a6, G551u5), the latter three are nonsynonymous (Table 2). We did not observe any association with SLE for HVEM, CD30 and CD120B using family-based tests of association. We used clinical information on our cohorts based on previous reports for lupus nephritis, thrombosis and anti-cardiolipin IgG antibodies, and no evidence for association was found. There have been no association studies previously described for HVEM and only one study of CD30 and type I diabetes, which found no association.34

CD120B has been extensively studied in SLE and other autoimmune diseases.15, 33, 16, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48 The M196R polymorphism in exon 6 of CD120B has been examined in European-Caucasians, Koreans, Japanese and African-Americans. There have been associations of this SNP with SLE, hypertension, hypercholesterolemia, Crohn's disease, bone mineral density, FCHL, RA, hyperandrogenism and polycystic ovary syndrome.15, 33, 40, 41, 42, 44, 48, 49 In SLE, there have been two positive associations and six negative associations of polymorphisms in CD120B.15, 16, 35, 36, 37, 38, 39, 50 However, no comprehensive haplotype-based studies have been performed. We found no association with SLE of M196R (G551u4) in either cohort 1 and 2 or in combined cohort results. The allele frequency of G551u4 (M196R) shown here was similar to that reported by groups looking at European-Caucasians and Oriental populations. The Afro-Caribbean allele frequency seen in this study is also comparable to a study on African-Americans and RA.51

We have characterised the LD pattern and haplotype structure for HVEM, CD30 and CD120B in European-Caucasians. There is strong LD across the HVEM locus and little recombination. Less rigorous haplotype block algorithms produced one block across HVEM. CD30 and CD120B markers span the locus enclosing both genes completely. Within CD30, there were two small areas of strong LD but a surprising lack of LD across the gene. Least information and LD was seen from introns 1 to 6 and also in the 3′-end of CD30 (Figure 2). Between CD30 and CD120B, there is a block of strong LD with no known genes within this area. The 8 kb sequence was queried against EST databases using the Gene2EST web-based program.52 The EST sequences returned were consequently aligned and visualised with the Artemis program.53 There was one cluster of overlapping human ESTs spanning 1.2 kb and no specific patterns of coding sequence motifs/features were observed. CD120B had strong LD across a large part of the gene (introns 1 to 9) and SNP pairs in and around CD120B had high D′ values, suggesting recombination is relatively newer than for CD30 (Figure 2). All the common haplotypes within regions of strong LD across HVEM, CD30 and CD120B can be identified by genotyping of haplotype tagging SNPs (htSNPs) as reported in the results section.

Haplotypes and LD patterns across HVEM, CD30 and CD120B in European-Caucasian and Yoruban-African populations were compared using the International HapMap Project database (http://www.hapmap.org/). The LD across HVEM, CD30 and CD120B was very similar to that seen in our cohorts for European-Caucasians. In all, 16 Sixteen SNPs utilised by the HapMap were also genotyped by us and on comparison, allele frequencies were very similar for European-Caucasians and Afro-Caribbeans. LD patterns in our few Afro-Caribbean and Indo-Asian families were examined (data not shown). The Afro-Caribbean LD across HVEM, CD30 and CD120B was weak and there were no haplotype blocks seen, and the Indo-Asian LD pattern was similar to that of European-Caucasians across these genes. The HapMap Yoruban-African data across HVEM showed strong LD from the 5′-flanking region to intron 3 and no LD from intron 3 and 7 kb into the 3′-flanking region, which is similar to that seen in our European-Caucasians. Across the CD30 and CD120B locus, the Yoruban-Africans HapMap data were comparable to our Afro-Caribbean samples where there is weak LD.

Haplotype construction across CD30 and CD120B has been previously described in Korean and Japanese.54 The patterns of LD were similar in the two populations. This group showed strong LD across markers from introns 1 to 7 in Koreans and introns 1 to 6 in Japanese within CD120B. The LD pattern across CD120B was very similar to that seen in our European-Caucasian samples. The group reported low LD for CD30 and did not define any haplotypes across CD30. Interestingly, this group also had an area of high LD falling in the 3′-flanking region to CD30. However, their map was not as dense in this area and full comparison could not be made. This group captured 98% of the haplotype diversity with only three haplotypes in the Koreans and two haplotypes in the Japanese across CD120B. In comparison, we had six haplotypes in our European-Caucasian population with three high-frequency haplotypes. The Koreans and Japanese had one high-frequency haplotype of 75 and 82%, respectively, across CD120B.

If there were SLE susceptibility polymorphism/s or haplotypes within HVEM, CD30 and CD120B, we would have expected to be able to detect an association within areas of strong LD in this European-Caucasian family study. Power calculations were performed based on methods described elsewhere.55 For regions with strong LD, we have reasonable power (60% cohort 1, 80% cohort 1 and 2) to detect association. For rarer alleles (<10%), our power to rule out association is poor. It is difficult to completely exclude CD30 because of the lack of LD in this region. We cannot exclude a rare variant occurring within HVEM, CD30 and CD120B and association with SLE. However, we have captured more than 94% of the haplotype diversity in areas of high LD and all the common haplotypes and we have analysed all available missense mutations. These studies have been conducted in one of the largest single family collections of SLE available. Our inability to fully exclude a genetic contribution at this locus in SLE indicates the importance of collaboration and replication in complex trait analysis. Although we have shown no association with SLE, the wide range of diseases for which associations have been reported is such that HVEM, CD30 and CD120B htSNPs that we have identified, will be useful for screening areas across these genes in genetic association studies in European-Caucasian populations.