Introduction

Blood groups are antigenic molecules (proteins, carbohydrates, glycoproteins, or glycolipids) present on the surface of red blood cells (RBCs); which may also be present on platelets, lymphocytes, and body tissues. Many of these antigens are clinically significant and can lead to adverse reactions in transfused patients, hemolytic disease of the fetus and newborn in pregnant women, and graft loss in organ transplant recipients [1].

Rh is the fourth blood group system (ISBT 004) and consists mainly of the D, C, E, c, and e antigens carried on two non-glycosylated hydrophobic transmembrane proteins (RhD and RhCE) [2]. It is the most complex blood group system due to the highly homologous and polymorphic genes, RHD (MIM # 111680) and RHCE (MIM # 111700) [3]. The RHD gene is located on the short arm of chromosome 1 at position 1p36.11, ~62.3 kb long and flanked by two highly homologous, 9 kb long DNA segments, the upstream and downstream Rhesus boxes [4].

Large deletions have previously been reported for blood group systems other than Rh, such as the H-deficient Bombay phenotype [5], 4-α-galactosyltransferase-deficient p phenotype [6], glucosaminyl (N-acetyl) transferase 2-deficient i phenotype [7], and XK protein-deficient Kx null phenotype [8]. A total of 43 deletions in the RHD gene [9] and 12 deletions in the RHCE gene [10] have been identified, most of them shorter than 655 nucleotides [11, 12]. Previously reported deletions of 1013 bp encompassing exon 9 of the RHD gene [13, 14] and 2.5 kb encompassing exons 4, 5, and 6 are technical errors and obsolete [15,16,17].

For the RHCE gene, two large partial gene deletions have been reported. One included exon 2 and extended beyond exon 10 [18], and another large internal deletion encompassed exons 2 through 8 [19]. A large chromosomal deletion involving both the RHD and RHCE genes, and the nearby D1S80 variable number of tandem repeats locus has been reported [20]. Besides the common whole RHD gene deletion [4], two large partial RHD gene deletions are known: the RHD allele c.1074–649_1153+266del995 represents a 995 nucleotide deletion encompassing 649 nucleotides of intron 7, all 80 nucleotides of exon 8 and 266 nucleotides of intron 8 [21], five transcripts with missing exon 8 were reported including the longest with 170 nucleotides of intron 7 attached to normal exon 9 and 10 with an open reading frame; and the RHD allele c.1228–4061_1254+1317del5405 represents a 5405 bp deletion encompassing 4061 nucleotides of intron 9, all 27 nucleotides of the exon 10 and 1317 nucleotides of 3′ UTR [22], a single transcript was reported involving RHD exons 1 to 9 followed by a sequence tract of RHD intron 9, termed pseudo-exon 10. These two partial RHD gene deletions still express some D protein representing a DEL phenotype. The RHD exon 10 deletion, common in France [22] and Northern Germany, was subsequently found in a transfused red cell product that caused a secondary anti-D immunization [23].

In the present study, we describe three large RHD genomic deletions, two of which were novel, affecting stretches of 18.4, 5.4, and 7.6 kb in length and causing DEL or D-negative phenotypes. On the basis of analysis of the breakpoints, we discuss distinct mechanisms for the three deletions.

Materials and Methods

Study subjects

Two samples were collected as part of our screening at NIH of D-negative blood donors for the presence of RHD gene [3], another sample came from a blood donor in Springe, Germany. As described previously [24], RHD screens are a standard operating procedure in a growing number of blood centers, where this red cell-genotyping application for D-negative blood donations is covered by the donor consent form. EDTA–anticoagulated whole-blood samples were used for serology and DNA studies, and RNA tubes (PAXgene; PreAnalytiX, Hombrechtikon, Switzerland) were maintained at room temperature for up to 2 h, frozen and stored at −20 °C before RNA isolation. The analysis of the three donors was performed as case studies. We are planning to eventually report the cumulative data on RHD-positive samples found among our D-negative donors at NIH since 2009 including the current two donors.

Immunohematology

Hemagglutination tests were performed by standard tube and anti-IgG gel matrix testing with licensed reagents (Ortho, Raritan, NJ). An adsorption/elution method with human polyclonal anti-D was applied to test for the presence of a DEL phenotype [24]. Several RhD typing kits with 16 monoclonal anti-Ds were used to establish the epitope patterns as described previously [25]. Antibody screening and direct antiglobulin tests were negative for all three donors.

DNA and mRNA isolation

Genomic DNA was isolated from the buffy coat (Qiagen EZ1 DNA blood kit on the BioRobot EZ1; Qiagen, Valencia, CA) and mRNA from RNA tubes (Dynabeads mRNA DIRECT kit; Invitrogen, Carlsbad, CA). Primers were designed using software online (Primer3 [26], Supplementary Table S1 [4, 22, 27,28,29]).

RACE and cDNA sequencing

The 5′- and 3′-rapid amplification of cDNA ends (RACE) method was applied to the isolated mRNA (GeneRacer kit; Invitrogen) and reverse transcribed to cDNA using the Oligo(dT)-adapter primer (GeneRacer kit and SuperScript III First-Strand Synthesis SuperMix; Invitrogen). The resultant cDNA was then used as a template to obtain the 5′ and 3′ cDNA ends for nested polymerase chain reaction (PCR) amplification (GeneRacer primers included in the GeneRacer kit) with RHD or TMEM50A cDNA primers (Supplementary Table S1). No cDNA analysis was done for sample 2, because the cDNA sequence for this RHD deletion had been published before [22].

The PCR amplicons were purified and sequenced (BigDye Terminator v3.1; Applied Biosystems, Carlsbad, CA) as described previously [30]. Nucleotide sequences were aligned (CodonCode Aligner; CodonCode, Dedham, MA) to NCBI RefSeq NM_016124.4 and nucleotide positions defined using the first nucleotide of the coding sequence of RefSeq NM_016124.4 (RHD isoform 1).

Detection of deletion breakpoint

For the novel exon 1 deletion, copy-number analysis was done using a real-time PCR-walking approach on genomic DNA from the 5′-end of upstream Rhesus box to 3′-end of RHD intron 1 to identify the breakpoint region (sample 1). Amplification to detect the breakpoints was performed using a long-range PCR (LongAmp Taq DNA polymerase; New England Biolabs, Ipswich, MA) and the nucleotides were sequenced (Supplementary Table S1). PCR reactions, described by Fichou et al. [22], confirmed the known 5405 nucleotide deletion encompassing exon 10 of the RHD gene (sample 2). For the novel exon 10 deletion, a single long-range PCR reaction was developed to amplify the breakpoint and nucleotide sequencing allowed the precise mapping of the breakpoint (sample 3). The PCR products were separated on 1% agarose gel and sequenced [30] using primers encompassing each deletion junction (Supplementary Table S1).

RHD sequencing

The RHD gene was sequenced as previously described [27, 31]. The nucleotide sequences of all 10 exons as well as the adjacent intronic regions including the 5′ and 3′ untranslated regions (UTR) were determined. Zygosity testing for the RHD gene was done by restriction fragment length polymorphism (RFLP) [4] and quantitative fluorescence polymerase chain reaction [32].

RHD sequence analysis

Nucleotide sequences were aligned and compared with the NCBI RefSeq NG_007494.1. All variations are described according to current mutation nomenclature guidelines [33], ascribing the A of the first ATG translational initiation codon as nucleotide + 1 in the mRNA coding region of RHD (RefSeq NM_016124.4).

Bioinformatics analysis

To examine the potential mechanism for the deletion, 300 bps upstream and downstream from the deletion breakpoint were analyzed for distinct repetitive elements using the RepeatMasker track in the UCSC Genome Browser [34]. In cases where repetitive elements flanked both the deletion breakpoints, MUSCLE was used to determine the percentage of sequence identity between the elements [35].

Results

We analyzed two novel and one previously known partial deletion of the RHD gene in three blood donors (Table 1). Samples 1 and 2 had a DEL phenotype, whereas sample 3 was D-negative, as reproducibly confirmed by adsorption/elution testing [24]. All three samples were negative in serologic testing with 16 different monoclonal anti-Ds (Supplementary Table S2).

Table 1 Molecular basis of RHD alleles with partial deletions of the RHD gene longer than 655 nucleotides

RHD genetic variations

We were able to amplify all RHD exons, except exon 1 (sample 1) and exon 10 (samples 2 and 3), indicating alterations at either the 5′- or 3′-ends of the RHD gene. Amplification and sequencing of regions surrounding exon 1 and 10 using long-range PCR revealed large deletions in the RHD gene. The sequencing for amplified exons and adjacent intronic regions was identical with the RHD reference sequence (NG_007494.1).

The exact breakpoints could be determined within 5 and 3 nucleotides in overlapping sequences (GAATG and AGG [22]) at the breakpoint region for the RHDex1del type 1 (sample 1, Fig. 1a) and RHDex10del type 1 alleles (sample 2, Fig. 1b). The exact nucleotides involved at the breakpoint were identified for the RHDex10del type 2 allele (sample 3, Fig. 1c).

Fig. 1
figure 1

Molecular structures of large deletions in the RHD gene observed in this study. The genomic structures of three partial gene deletions are shown (ac). The RHD gene comprises 10 exons (black bars) flanked by the upstream (blue) and downstream Rhesus boxes (red). The nucleotide sequences of the breakpoint regions harbors microhomologies or non-templated nucleotides (bold letters, box) and are shown along with their electropherograms. The chromosomal positions flanking the three deletions are indicated for genomic DNA (NC_000001.11)

Sample 1 had an 18.4 kb deletion encompassing the upstream Rhesus box, 5′ UTR, exon 1, and part of intron 1 of the RHD gene [NC_000001.11(NG_007494.1):c.(1–15149_1–15153)_(148+3154_148+3158)del]. Sample 2 harbored the known 5.4 kb deletion encompassing intron 9, exon 10, and part of 3′ UTR of the RHD gene [NC_000001.11(NG_007494.1):c.(1227+2872_1227+2874)_(1254+1315_1254+1317)del]. Sample 3 had a 7.6 kb deletion encompassing intron 9, exon 10, and part of 3’ UTR of the RHD gene [NC_000001.11(NG_007494.1):c.1227+2108_1254+2785del].

Repetitive elements analysis

In sample 1, alignment of the 5′- and 3′-end regions bordering the deletion breakpoints showed 5-bp microhomology between the two regions (Fig. 1a). In sample 2, only one repetitive element, AluSx1, was detected upstream of the 5′ end. Alignment of the 5′- and 3′-end regions bordering the deletion breakpoints showed 3-bp microhomology (Fig. 1b). In sample 3, a FRAM element located at the 5′ breakpoint and AluYh3 element located around the 3′ breakpoint was identified. Alignment of the 5′- and 3′-end regions bordering the deletion breakpoints showed an insertion of 2 non-templated nucleotides (TT) at the junction (Fig. 1c). Sample 1 also harbored an AluSg element at the 5′ end and AluSx element at the 3′ end of the breakpoint (Fig. 2).

Fig. 2
figure 2

Proposed mechanism for the generation of the RHDex1del type 1 deletion. Schematic representation of the secondary structure formed by the two inverted Alu repeats (a). The stem-loop formation may have facilitated replication slippage, which resulted in the deletion of the upstream Rhesus box and RHD exon 1. Sequence comparison between AluSg in the intergenic region and AluSx in intron 1 (b). AluSx is shown in the reverse orientation to mimic its orientation in the putative secondary structure. Complementary nucleotides (underlined) represent ~80% of all positions

Effect on cDNA and protein structure

We determined the cDNAs of the two novel deletions (Fig. 3). For the RHDex1del type 1 allele, we detected an mRNA transcript of 478 nucleotides encompassing the 3′ end of exon 2 to the 5′ end of exon 5 (KX584098) using 5′ RACE analysis. Bioinformatic analysis indicated a potential translation start site in RHD exon 2 (Fig. 3a).

Fig. 3
figure 3

Schematic representation of transcripts observed in three partial RHD gene deletions. The RHD genes (upper model in ad) depicted by exons (rectangles) and introns (lines). The resulting mRNA transcripts (lower model, ad) are symbolized as a concatenation of exons. The start (blue) and stop codons (red) are underlined. RHDex1del type 1 mRNA starts at c.186 in exon 2 and extends into 3′ UTR a. RHDex10del type 2 short mRNA includes RHD exons 1 to 7, part of intron 7, exon 8, and part of intron 8 with a stop codon in exon 8 (b). RHDex10del type 2 long mRNA includes exons 1 to 9 of RHD —with a stop codon immediately after exon 9—the 5′ UTR, all 6 exons and 3′ UTR of TMEM50A. The two inserted nucleotides (UU) are highlighted at the junction (pink box) (c). RHDex10del type 1 mRNA includes exon 1 to 9 followed by a sequence tract of RHD intron 9 (pseudo-exon 10) due to the activation of a cryptic splice acceptor site (d). As RHDex1del type 1 involves the 5′ end, splicing is not affected. The splice variants are shown for the other two partial RHD gene deletions

For the RHDex10del type 2 allele, we detected two different mRNA transcripts using 3′ RACE analysis: (1) a 421 nucleotides short mRNA including at least exon 7, 170 nucleotides of intron 7, exon 8 and 107 nucleotides of intron 8 with a stop codon in exon 8 (KX619611) (Fig. 3b), and (2) a 1963 nucleotides hybrid RHD-TMEM50A long mRNA encompassing exons 1 to 9 of RHD gene with a stop codon just after RHD exon 9 and the 5′ UTR, coding sequence and 3′ UTR of the TMEM50A gene (KX584096) (Fig. 3c). Neither cDNA was expected to express any RhD protein in the membrane.

For comparison, the previously reported transcript [22] for the RHDex10del type 1 allele caused the replacement of the last eight amino acids of the wild-type RhD protein by four different amino acids (Fig. 3d). We modeled the two proteins based on the observed cDNAs of the RHDex1del type 1 and RHDex10del type 1 alleles and the previously predicted RhD protein model [36] (Supplementaruy Fig. S1). Sample 1 with the RHDex1del type 1 allele tested negative for the G-antigen by adsorption/elution technique [24].

Discussion

We reported two new deletions and confirmed a previously known deletion [22] affecting 5.4–18.4 kb of the RHD gene (Table 1). These were the first observation of a deletion at the 5′ end of the RHD gene and the second observation of a deletion at its 3′ end (Fig. 1). The two novel partial deletions were the largest known for the RHD gene and doubled the number of observed deletions longer than 655 nucleotides (Fig. 4).

Fig. 4
figure 4

Molecular structures of large deletions in the RHD gene. The genomic structures of all 4 partial gene deletions are shown. The RHD gene comprises 10 exons (black bars) flanked by the upstream (blue) and downstream Rhesus boxes (red). The positions and sizes of four deletions are indicated (green arrow)

Within a gene sequence, a high proportion of Alu repeats may promote gross gene rearrangements including partial gene deletions [37]. The core 26-bp sequence within the Alu repeat has been reported to be particularly recombinogenic [38]. In the present study, at least one breakpoint of each deletion mapped within or very close to Alu repeats, which straddled both breakpoints in the deletion of sample 1 (Fig. 2) and sample 3 and 1 breakpoint of sample 2.

An AluSg element and an AluSx element occurred near the proximal and distal breakpoints, respectively, in the RHDex1del type 1 (sample 1). A stem-loop may form during DNA replication (Fig. 2a) because the Alu repeats have 80% complementarity (Fig. 2b), bringing the distant GAATG repeats in close proximity to each other. The stem-loop can also pause the progression of replication fork, releasing the 3′ end of nascent leading strand, which can align at the downstream GAATG repeat causing the deletion (Fig. 2a). These molecular features were consistent with the fork stalling and template switching [39]/microhomology-mediated break-induced replication [39, 40] mechanism [41].

An AluSx1 element occurred near the proximal breakpoint region of RHDex10del type 1 (sample 2). Although no long stretches of continuous homology were detected around the distal breakpoint, such Alu sequences found in the vicinity of single breakpoints could still mediate the corresponding rearrangement by non-homologous recombination [38]. However, the presence of short direct repeat, AGG, flanking both the breakpoints implied classical replication slippage as the most likely mechanism for this deletion [42, 43].

A FRAM element was located at the proximal breakpoint of RHDex10del type 2 (sample 3) and also an AluYh3 element around the distal breakpoint featuring several recombination motifs GCG and GAS [44]. The presence of an insertion of 2 bp of non-template DNA (Fig. 1c) implied classical non-homologous end-joining [45] as the most likely mechanism responsible for this deletion [46, 47].

The RHDex1del type 1 expressed a DEL phenotype in sample 1, which correlated well with previous observations of many RHD exon 1 variations being associated with DEL phenotypes [9, 24, 48,49,50,51]. Deletion of amino acids 1 to 82 at the N-terminal end of the RhD protein indicated the loss of first and second extracellular loops from the mature protein (Supplementary Fig. S1A). This RhD model predicted the absence of the G-antigen, because the residue Ser103 in the second extracellular loop is known to be involved in the conformation-dependent G-antigen formation [52, 53]. An adsorption-elution with anti-G was negative, thus confirming the predicted loss of G-antigen expression.

The RHDex10del type 1 allele has previously been characterized as a serologic weak D phenotype [22], but was observed as a DEL phenotype in sample 2. The deletion was confirmed using published primers [22]. Variable strength of D-antigen expression, such as DEL and weak D, have been observed before in other RHD variants and can be caused by different sensitivities of the serologic techniques.

Sample 3 (RHDex10del type 2), lacking all eight amino acids encoded by RHD exon 10, was found to be negative for the D-antigen by adsorption/elution tests. Amino acid mutations affecting the C-terminal cytoplasmic positions 391 to 417 [54] have often been associated with a reduced expression of the protein in the RBC membrane [55,56,57]. Aromatic and hydrophobic C-terminal amino acids are important in mediating efficient transport of membrane proteins by interacting with COPII coat components [58] and also affect the interaction of RhD protein with the RBC cytoskeleton, specifically ankyrin-R [59, 60]. However, as exon 9 of RhD protein has the relevant amino acids V, F, and W at positions 406, 407, and 408, the complete lack of RhD expression in the RBC membrane was surprising. More sensitive techniques, such as mass-spectrometry, may be used to exclude the expression of any miniscule amounts of RhD protein present in the RBC membrane.

Many RHD alleles are associated with frequent anti-D alloimmunization, especially in chronically transfused patients, such as involving hemoglobinopathies [61, 62], or pregnant women. We defined the breakpoints of two novel deletions and suggested three possible mechanisms based on the sequences at and around the breakpoints for three distinct deletions. The identification of the deletion breakpoints allows designing allele-specific PCR assays and targeted screening for these deletions. Knowledge of the exact sequence details will aid in identifying the clinically relevant RHD alleles occurring in patient samples, especially when applying high throughput technologies, such as next-generation sequencing [63].