Introduction

This review is part of a series of reviews on DNA repair, and other reviews in this issue provide an overview of nonhomologous DNA end joining (NHEJ) and discuss the earliest steps of NHEJ (see Weterings and Chen, this issue; and Shrivastav et al., this issue). In this review, we focus on the nucleolytic, polymerization, and ligation steps of NHEJ.

Most DNA repair pathways require nucleolytic resection of the damaged DNA, DNA polymerization to provide new DNA to replace the resected DNA, and ligation to restore the integrity of the phosphodiester backbone. NHEJ includes nuclease, polymerase, and ligase activites that demonstrate distinctive enzymatic flexibilities that are ideally suited for the NHEJ process.

Our view of NHEJ is that Ku binds initially at the DSB site. Ku may bind at one side of the DSB or both sides, depending on the length of the overhang relative to the nearest nucleosomes. DSBs within the DNA that is wrapped around a single nucleosome appear to be able to diffuse off of the nucleosome surface sufficiently to permit Ku binding 1. Thereafter, Ku can recruit the nuclease (Artemis:DNA-PKcs), polymerase (pol μ and pol λ), or ligase (XLF:XRCC4:DNA ligase IV) in any order to work on the 'left' or 'right' end at the DSB. In addition, the nuclease and the ligase activities can work on the 'top' strand somewhat independently of their action on the 'bottom' strand of each of the two DNA ends of the DSB 2, 3. The polymerase activities also have substantial flexibility, as we will discuss.

The focus here is on vertebrate NHEJ (Figure 1). Yeast, plants, and invertebrates appear to differ in some aspects of NHEJ, because they do not have DNA-PKcs or Artemis; hence, the nuclease in some of these organisms may, in part, be provided by the RAD50:MRE11:XRS2 complex 4. In addition, homologous recombination is, by far, the preferred pathway for repair of DSBs in yeast. Because other aspects of NHEJ are discussed elsewhere in this series, the emphasis here will be on the flexibility of vertebrate NHEJ and how that flexibility is optimal for the roles of NHEJ.

Figure 1
figure 1

Double-strand breaks and their repair. The causes of pathologic double-strand breaks (DSBs) include ionizing radiation (γ and X-rays) and free radicals. Physiologically programmed double-strand breaks are generated during the course of V(D)J recombination in pre-B (bone marrow) and pre-T cells (thymus) and during the course of class switch recombination in activated B cells (in the peripheral lymphoid tissues such as spleen, lymph nodes and Peyer's patches). RAG1 and 2 (along with HMGB1) are involved in generating the breaks in V(D)J recombination. Activation-induced deaminase (AID), uracil glycosylase (UNG2), and perhaps abasic endonuclease (apurinic/apyrimidinic endonuclease) are involved in generating the breaks in class switch recombination. Homologous recombination can repair DSBs in late S or G2 of the cell cycle. Nonhomologous DNA end joining (NHEJ) can repair DSBs anytime. A subset of the proteins involved in homologous recombination is listed. The known proteins involved in NHEJ are listed.

Nucleolytic resection by the Artemis: DNA-PKcs complex in NHEJ

The Ku:DNA complex at DSBs can recruit DNA-PKcs with or without Artemis, though we suspect that the Artemis:DNA-PKcs complex may be more commonly recruited in these cases 3. Ku can slide internally to permit DNA contact with the DNA-PKcs 5, and this activates the serine/threonine kinase activity of DNA-PKcs. DNA-PKcs can phosphorylate itself in cis, but if two DNA-PKcs molecules are at the DSB – one at each end – then this phosphorylation can also occur in trans 6. In addition, DNA-PKcs can phosphorylate Artemis 3. The Artemis:DNA-PKcs complex changes conformation as a result of these phosphorylation events, allowing Artemis to function as an endonuclease. Which precise phosphorylation sites in Artemis are most critical is not yet clear 7, 8; one group maintains that the most critical sites are within DNA-PKcs itself 9. However, a conformational change within Artemis as a result of either its binding to DNA-PKcs or its phosphorylation by DNA-PKcs seems likely, given that removal of the C-terminal half of Artemis, which is the site of most of the phosphorylation, permits Artemis to function as an endonuclease, even without DNA-PKcs for some DNA substrates 8. These issues of conformational change and of specific phosphorylation sites merit continued detailed analysis.

Once activated, the Artemis:DNA-PKcs complex demonstrates a degree of nucleolytic flexibility that is distinctive among structure-specific nucleases 10. The Artemis:DNA-PKcs complex can cleave 5′ overhangs to a blunt or nearly blunt configuration and can cleave 3′ overhangs back to a roughly 4 nt overhang. In the context of V(D)J recombination (Figure 2), the RAG complex creates hairpinned coding ends, and the Artemis: DNA-PKcs complex nicks these hairpins, thereby allowing the resulting physiologic DNA ends to now be treated like double-strand breaks generated by any pathologic cause 3. Among the 3′ overhang structures that can be cut by activated Artemis:DNA-PKcs, 3′ phosphoglycolate and other 3′ termini can be removed, activities relevant to a wide range of breaks created by ionizing radiation 11, 12.

Figure 2
figure 2

Model of V(D)J recombination. (A) The RAG cleavage steps. RAG-1, RAG-2, and HMGB1 form a complex (suspected stoichiometry [(RAG1)2(RAG2)2HMGB1]) that binds at recombination signal sequence (abbreviated RSS in the text and labeled 12-signal or 23-signal on the figure). Each signal sequence (RSS) consists of a palindromic heptamer and an AT-rich nonamer separated by either 12 or 23 bp (V coding end-CACAGTG-(12/23 bp) ACAAAAACC). Initially, the RAG complex creates a nick adjacent to the coding end side of the heptamer of each RSS. Then the two nicked RSSs are brought into synapsis. The RAG complex then uses the 3' OH at each nick site as a nucleophile to attack the opposite strand of each duplex to create hairpins at each V, D, or J coding end. (B) Displacement of the RAG complex and the hairpin opening step. Ku binds to one or both of the coding ends. Ku recruits the Artemis:DNA-PKcs complex. The Artemis:DNA-PKcs complex opens the hairpins. It is not yet clear what displaces the RAG complex, thereby permitting the ligation of the two signal ends (see (C)). (C) Coding end processing and ligation of the ends. The two signal ends may be ligated to one another by XLF:XRCC4:DNA ligase IV as soon as the RAGs are displaced from cleaved RSSs. The two coding ends may be treated like any two DNA ends being processed by the NHEJ pathway, with the only exception being the participation of a template-independent polymerase called terminal transferase or TdT. Any nucleotide trimming is likely to be done by the Artemis:DNA-PKcs complex and gap fill-in (template-dependent) synthesis may be done by polymerase μ and/or polymerase λ. The ligase for the coding ends is also the XLF:XRCC4:DNA ligase IV complex. The stoichiometry of the XLF:XRCC4:DNA ligase IV complex is not yet determined.

The Artemis:DNA-PKcs complex is able to nick a wider range of DNA structures than just overhangs. It can nick regions of mismatched bases (e.g., 5 bp bubbles). It can also nick the ssDNA within gaps of double-stranded DNA 10. The latter activity of Artemis:DNA-PKcs means that if the top strand at a double-strand break is ligated, but the bottom strand has a gap, then this nuclease complex can re-nick the top strand, thereby returning the material to the DSB configuration (possibly with transfer of some nucleotides that were originally part of the right DNA end to the left end). This activity by Artemis:DNA-PKcs would permit revision of junctions that are partially joined (via ligation of only one strand).

In addition to the endonucleolytic activity of Artemis when it is in complex with DNA-PKcs, Artemis alone has 5′ exonucleolytic activity 3. It is not yet clear what fraction of the time Artemis is free from DNA-PKcs. Hence, it is difficult to assess the contribution of the Artemis 5′ exonuclease relative to its endonuclease activities in vivo at this time.

Pol μ and pol λ in NHEJ

As mentioned, our view is that Ku can recruit the nuclease, the ligase, or the polymerase in any order and repeatedly in various revisions of the junction, during one joining process. Pol μ and pol λ both participate in NHEJ 13, 14, 15, 16, 17, and they have distinctive biochemical flexibilities for this purpose.

The genetic data for pol μ and pol λ in NHEJ are from yeast and mouse knockouts 13, 14, and the yeast data are informative regarding the corresponding enzymes (pol μ and pol λ) in vertebrates. In Saccharomyces cerevisiae, POL4 functions in NHEJ, although other polymerases also appear able to function when POL4 is absent 17. POL4 is the only Pol X family member in S. cerevisiae. In humans, the closest homologues are pol μ or pol λ. In mice lacking pol μ or pol λ, the junctions of V(D)J recombination have more nucleolytic resection 13, 14. Specifically, mice lacking pol μ have shorter Ig light-chain V to J junctions, whereas mice lacking pol λ have shorter Ig heavy-chain D to J and V to DJ junctions.

Because both pol μ and pol λ appear to be expressed widely in somatic cells, it is not yet clear how there might be a division of labor between these two polymerases. This division of labor may have its basis in how stable either of them is at pairs of DNA ends with 5′ overhangs versus 3′ overhangs 18. When human pol μ was introduced into S. cerevisiae lacking POL4, the fill-in synthesis was different from that when pol λ was introduced. Specifically, pol λ appeared better at fill-in of 5′ overhangs, whereas pol μ appeared more stable at fill-in of 3′ overhangs. Therefore, pol μ and pol λ may differ in their range of efficiency for fill-in synthesis at meta-stable regions, such as at an unligated junction annealed together via a few hydrogen bonds. Pol μ and pol λ both have lyase domains, but only the pol λ lyase domain appears to be active. The lyase activity of pol λ may be useful during repair at some sites 19.

Biochemical approaches have also implicated a role for pol μ and pol λ in NHEJ. First, antibodies have been used to deplete pol λ from crude extracts 20. This resulted in a reduction of junctional fill-in synthesis and joining. Second, there is evidence that pol μ is bound at increased efficiency at a DNA end that already has Ku and XRCC4:DNA ligase IV at that end 15, and the same observation was made in S. cerevisiae for POL4 16. Third, subsequent work showed that Ku alone is sufficient to recruit pol μ or pol λ to a DNA end without the ligase, and this recruitment requires the BRCT domain located in the N-terminal region of these two polymerases 2.

In vitro reconstitutions of NHEJ have demonstrated that pol μ and pol λ can participate with the other NHEJ components in a productive manner that generates joining sites that are indistinguishable from those seen at in vivo sites of NHEJ 2. In a system in which a subset of NHEJ components is present, specifically Ku, XRCC4:DNA ligase IV, and pol μ, it was proposed that pol μ could jump between two incompatible DNA ends with 3′ overhangs 21. In other words, it was proposed that pol μ could use a discontinuous DNA template. However, we have demonstrated template-independent addition by human pol μ under physiologically relevant conditions 22. This contributes to the flexibility of repair at DSB junctions. Template-independent addition by pol μ permits annealing of ends that do not initially share any terminal microhomology, but then acquire microhomology within the randomly added nucleotides. Hence, rather than the polymerase jumping from one DNA end to another 21, annealing of newly generated microhomology is likely to explain some or all of the addition and ligation at initially incompatible DNA ends 22. The data regarding template-independent addition just described are consistent with data from the Blanco laboratory 23 and data from the Kunkel laboratory 24. Specifically, template-independent addition by pol μ is substantial for wt pol μ and is reduced by the H329 mutation and loop 1 mutations in pol μ. Hence, template-independent addition by pol μ, rather than use of a discontinuous template, appears likely to be the basis for the joining of incompatible DNA ends.

The XLF:XRCC4:DNA ligase IV complex in NHEJ

DNA ligase IV is the ligase for NHEJ, and XRCC4 improves the stability of the ligase IV within cells and stimulates the adenylation or 'charging' step of this ligase 25, 26, 27, 28, 29, 30, 31. Ligase IV is distinctive because a majority of it is pre-charged when purified from mammalian cells 32. Based on gel filtration of the native ligase IV activity from human cells, its molecular weight is in the range of 160-180 kDa 32. This would be consistent with one 105 kDa ligase IV molecule plus one XLF (33 kDa) and one XRCC4 molecule (42 kDa) or with two XLF or two XRCC4.

XLF stimulates joining of DNA ends by XRCC4:DNA ligase IV, but we do not yet know whether XLF displaces XRCC4 from ligase IV during that stimulation 33, 34, 35. XLF stimulates compatible 36, 37 and incompatible DNA end joining 38 when added to reactions containing XRCC4:DNA ligase IV, but the stimulation is substantially greater for incompatible DNA end joining at concentrations of free Mg2+ that correspond to the level found within the cell (about 0.5 mM) 39. Based on protein interaction studies, it appears that ligase IV binds to XRCC4 more tightly than to XLF 37, 40, 41. It is not clear whether XLF and XRCC4 bind to one another without ligase IV.

The XRCC4:DNA ligase IV complex is able to ligate a subset of incompatible end configurations if Ku is present 22, 38, 39. If 1 nt of terminal microhomology is present, then Ku becomes dispensible, but is still stimulatory 22, 39. But if 4 nt of terminal microhomology is present, then the presence of Ku has little or no effect.

XRCC4:DNA ligase IV is able to ligate one strand regardless of the ligatability of the other strand 2, 22. Hence, at a DSB, the top strand can be ligated even when the bottom strand remains unligated or unligatable.

As mentioned above, XRCC4:DNA ligase IV is able to ligate incompatible DNA ends when Ku is present, and the extent of this ligation can be substantial when XLF is also present. These ligations are influenced by the sequence of the overhangs 39. The rules for these sequence effects suggest that purines on the top and bottom overhanging strands can sometimes be in steric conflict. In addition, poly T overhangs are distinctly more ligatable for reasons that are not yet clear. This predilection for T overhang ligation is interesting because pol μ prefers to add runs of T when synthesizing in its template-independent mode.

Flexibility in the order of the nuclease, polymerase and ligase steps in NHEJ

At a DSB, each DNA end would be bound by Ku, and each Ku:end complex serves as a hub into which the nuclease, the polymerases, and the ligase complex can enter or leave independently. Though the hub metaphor serves to reflect the activities that can enter or leave each of the two DNA ends, the toolbelt metaphor for Ku reflects its toroidal shape 42 and its ability to slide internally and still recruit the nuclease, polymerases, or ligase. These aspects contribute to the wide range of possible outcomes that can arise from one identical pair of starting ends, depending on whether the nuclease, polymerses, or ligase act first, second, etc., and depending on whether the top strand is ligated first or not. Hence, the hundreds of possible outcomes are not surprising.

Role of NHEJ in V(D)J recombination

V(D)J recombination is the process by which the antigen receptor loci undergo assembly of the variable domain exons of immunoglobulin and T cell receptor molecules. The process involves a specialized DSB-generating complex at two sites, followed by NHEJ to join the four DNA ends together in a new configuration (rearrangement). The RAG1, RAG2, and HMGB1 proteins form the complex that nicks each of the two signal sequences adjacent to a V and J segment (or a D and J segment). Then the RAG complex brings the two nicked species together into synapsis (Figure 2). Next the 3′ OH at the nick is used as a nucleophile by the RAG complex to attack the phosphodiester backbone, thereby resulting in DNA hairpins at the two V and J coding ends and blunt signal ends 43. The Artemis:DNA-PKcs complex is necessary to open the two DNA hairpins 3. Once this occurs, the two coding ends are joined to form a VJ exon, and the two signal ends are joined to form a signal joint. In addition to Artemis:DNA-PKcs, coding joint formation relies on Ku and XLF:XRCC4:DNA ligase IV 44. Pol μ and pol λ also influence the range of coding end possibilities 13, 14.

Recently, it was shown that a mutant form of RAG2 permitted coding joint formation even when XRCC4, DNA-PKcs, or Ku was not individually present 45. The authors suggested that this may be because of an alternative form of NHEJ (see below). Another possibility, however, is that this mutant form of RAG2 persists past the G1/S boundary, thereby permitting some of the enzymes of homologous recombination (HR) to participate. For example, in HR, a nuclease (possibly the RAD50:Mre11:Nbs1 complex, even though Mre11 is a 3′ exonuclease) carries out 5′ exonucleolytic resection, so as to leave 3′ overhangs. This might permit microhomology searches much more deeply into the hairpin coding ends than is normal, and the sequences of V(D)J recombination junctions show precisely this feature of longer than usual microhomology usage deep into the coding ends. In fact, even the signal joints formed in cells with this mutant RAG2 show microhomology usage, which is very atypical of V(D)J recombination. Yet another alternative possibility is that the mutant RAG complex nicks inappropriately at substantial distances internal to the coding end hairpin tip. This would also permit microhomology searches at internal positions.

Role of NHEJ in class switch recombination

Class switch recombination (CSR) is the process in which immunoglobulins are changed from IgM to IgG, IgA, or IgE 46, 47, 48. This is a second lymphoid process involving gene rearrangement using developmentally programmed DSBs. However, this process only occurs at the Ig heavy chain locus and occurs within (or near) switch regions. Switch regions are 1-12 kb in length and consist of repeats with the unit length of 25-80 bp. The repeats are rich in AGCT and GGGG motifs. The GGGG sequences promote R-loop formation when RNA polymerase transits through these switch regions 49, 50. The AGCT sites are preferred sites of action by a cytidine deaminase called activation-induced deaminase or AID 51. AID converts C to U in regions of single-strandedness 52. The single strandedness at CSR regions is created by the R-loops 49, although the transit of RNA polymerase appears sufficient to provide a less efficient form of transient single-strandedness 51. After AID creates U's, uracil glycosylase converts these to abasic sites 53, and apurinic/apyrimidinic endonuclease 1 (APE1) nicks the phosphodiester backbone. The resulting nicks are not necessarily across from one another, and hence these DSBs may have long overhangs. In fact, it is possible that Exo1 is necessary to resect at the overhangs to permit the two DNA ends to separate 54.

It has been presumed, based on limited data, that NHEJ is the mechanism of rejoining the DNA ends in CSR. The de Villartay laboratory and the Alt laboratory independently examined the efficiency of CSR in B cells lacking XRCC4 55, 56. They found that CSR was reduced significantly (even a 2-fold reduction in CSR is substantial). However, the residual joining appears similar to NHEJ, but with more frequent use of terminal microhomology at longer lengths (several nucleotides usually). These features might be consistent with the use of all of the NHEJ components, except that in the absence of an XRCC4:DNA ligase IV complex ligase I or III may substitute (see discussion below of alternative pathways of NHEJ). This is particularly likely at class switch regions because they have a high density of AGCT and GGGG repeats. In addition, the DSBs at CSR regions, in many instances, may involve staggered nicks, resulting in long overhangs. The long overhangs provide an opportunity for the AGCT and GGGG repeats to align. Then, when polymerases, such as pol μ and pol λ, extend from the regions of microhomology, the remaining nicks would be so far apart that they are effectively isolated nicks, making them ligatable by any of the three mammalian ligases.

Are there alternative NHEJ pathways?

In purified biochemical systems, the nuclease, polymerases, and ligase of NHEJ can function independently of one another. Each can even function somewhat independently of Ku, although their efficiency of loading at a DNA end is not as high as that at a Ku:DNA end 22. Hence, when one NHEJ component is missing in vivo, the other NHEJ components remain, and may be able to carry out their function. In mice lacking components of the XLF:XRCC4:DNA ligase IV complex, the joining events that are observed (at chromosomal translocation sites or CSR sites, for example) show a higher frequency and greater length of terminal microhomology usage 57. This may be because the less efficient joining involves more resection of the DNA ends and more opportunity for microhomology searches. One could term such joining events as ligase IV-independent NHEJ, rather than by some other phrases such as microhomology-mediated end joining (MMEJ) or alternative NHEJ (A-NHEJ or Alt-NHEJ) or backup NHEJ (B-NHEJ). As described in the previous section, corresponding enzymes (in this case, ligase I or ligase III) may participate when the primary NHEJ enzyme is missing.

Roles of NHEJ in cancer and aging

Because NHEJ is imprecise, any site at which it acts results in restoration of the chromosome structural integrity, but in loss of a few nucleotides at the site. This loss of information at that repair site might be thought of as an 'information scar', and these scars will accumulate randomly throughout each cell's genome over time. These somatic mutations would be expected to contribute to the dysfunction of the cell, which might represent one potential contributor to biological aging. A subset of mutations might contribute to excessive cell proliferation (neoplasia). Interestingly, about 10% of p53 mutations in human cancers consist of deletions that might represent sites of NHEJ 58.

NHEJ also contributes to cancer during breakage–fusion–bridge cycles 59. When chromosomal translocations occur, some derivative chromosomes have two centromeres. Such dicentric chromosomes may break anywhere between the two centromeres during mitosis. The broken ends are rejoined by NHEJ, but there will be information loss at the rejoining site with each round of breakage and rejoining.

The derivative chromosomes in the large majority of chromosomal translocations (both acquired by somatic cells, as in cancer, as well as inherited or constitutional ones) are rejoined by NHEJ. However, the rejoining event itself does not necessarily contribute to cancer formation; rather the initial breakage events are the cause of the instability, and NHEJ merely serves to rejoin the ends of the chromosomal segments. Therefore, NHEJ is the mechanism of rejoining in chromosomal translocations, but other events account for the breakage process.

Concluding comments

Future aspects of the NHEJ field include continued identification of components, continued biochemical determination of the function of those components, integration with the signaling and damage response pathways, alteration of chromatin structure at break sites, definition of how break sites are brought together, and determination of how many unrepaired DSBs are lethal to the cell and under what circumstances. NHEJ is among the most recently defined repair pathways and remains a topic of substantial importance in cancer, aging, immune system development, and basic nuclear metabolism.