Main

Modified cytidines can enter deoxynucleotide pools, because salvage and nutrient uptake pathways can recover nucleosides, rather than simpler degradation products such as uric acid in the salvage of purines12. Previous biochemical work has suggested that 5mdC is not incorporated in the DNA, but is salvaged as thymidine4,5,6. Salvage of oxidized 5-methylcytosine variants has not been previously characterized. We rationalized that, if nucleosides are recovered in unphosphorylated forms (through import) or monophosphate forms (through intracellular hydrolysis), the barrier restricting their incorporation into the DNA may lie in the nucleotide salvage enzymes or DNA polymerases. Providing cells with a final substrate for DNA polymerases, in the form of deoxynucleoside triphosphate, would allow decoupling of DNA synthesis from salvage enzyme activity. Therefore, we transfected two human cancer cell lines—MDA-MB-231 and H1299—with 5-hydroxymethyl-2′deoxycytidine triphosphate (5hmdCTP), isolated DNA and analysed the base composition by a high-performance liquid chromatography–ultraviolet (HPLC–UV) method, using a set of nucleoside standards for calibration (Fig. 1a). After 5hmdCTP transfection, two additional nucleosides were observed in the hydrolysed DNA that correspond to 5hmdC and 5hmdU (Fig. 1b, c and Extended Data Fig. 1b). This indicates that DNA polymerases can incorporate 5hmdC into DNA, and also demonstrates strong deaminase activity acting on either the nucleotide or the incorporated base, resulting in the presence of 5-hydroxymethyluracil (5hmUra) in the DNA. The capacity for DNA polymerases to use 5hmdCTP was also evident in an in vitro replication assay13 (Fig. 1d), demonstrating that human DNA polymerases are not selective against the incorporation of 5hmdC into DNA. Therefore, if salvage pathways can convert pre-existing sources of 5hmdC into their nucleotide triphosphate forms, this could result in their incorporation into cellular DNA and potentially lead to deleterious effects on the epigenome.

Figure 1: DNA polymerase and nucleoside kinase activities on modified nucleosides.
figure 1

a, HPLC–UV (260 nm) chromatogram of nucleosides mixed in equimolar amounts. b, HPLC–UV examination of nucleosides derived from DNA extracted from MDA-MB-231 cells nucleoporated with 5hmdCTP. c, The abundance of 5hmdC relative to dG in nucleoporated MDA-MB-231 (n = 3). d, 5hmdCTP incorporation assessed by in vitro replication assay (IVRA) performed in HeLa cells, measured by HPLC–UV (n = 5). Error bars in c and d denote s.d.; ND, not detected e, Thin-layer chromatography (TLC) separation of reaction products of DCK and CMPK1 kinases, which were supplied with different modified cytidine substrates. xDP, cytidine diphosphates; xMP, cytidine monophosphates.

PowerPoint slide

The final triphosphate form of cytidine in a cell is produced by sequential phosphorylation by three classes of cytidine kinases. First, deoxycytidine kinase (DCK) produces a monophosphate, which is then converted into a diphosphate by cytidine monophosphate kinases (CMPK1 and CMPK2), and subsequently converted into a triphosphate by the family of nucleoside diphosphate kinases14. Because nucleoside diphosphate kinases phosphorylate both purine and pyrimidine nucleosides15, and CMPK2 is found in the mitochondria16, we directed our efforts towards examining the substrate selectivity of DCK and CMPK1. Recombinant DCK was able to transfer the phosphate from ATP[γ-32P] to 5mdC, 5hmdC and 5fdC, but not to 5cadC (Fig. 1e and Extended Data Fig. 1d), while CMPK1 phosphorylated only unmodified cytidine monophosphate (Fig. 1e). In agreement with previous work on 5mdC (ref. 4), we can conclude that the inability of CMPK1 to create diphosphates of modified nucleotides provides the main barrier to the formation of respective dCTPs, limiting their availability for DNA polymerases, which can instead accept modified dCTPs.

Given this inherent selectivity of the nucleotide salvage pathway kinase CMPK1 for unmodified cytidine, we proposed that the introduction of abundant biologically modified cytidine variants would have little adverse effect on the physiology of a cell, unless they significantly impaired nucleotide metabolism. First, we determined that biological cytidine variants retain 70–100% of their original form after incubation in water and cell culture media for 10 days at 37 °C, while 80% of the synthetic variant 5-aza-2′deoxycytidine (5azadC) decomposed by day 2 in agreement with previous observations17 (Extended Data Fig. 2a–c). Next, a panel of 19 human cell lines was selected, sampling various tissue origins and p53 mutation statuses18 (Extended Data Fig. 2d). When cell growth media was supplemented with 10 μM 5hmdC or dC, most of the cell lines continued to proliferate at a normal rate. However, two cell lines (HOP-92 and MDA-MB-231) unexpectedly ceased to proliferate in the presence of 5hmdC (Fig. 2a). We found that 10 μM 5hmdC was lethal and 1 μM 5hmdC caused mild growth inhibition (Fig. 2b). Interestingly, 5fdC was more potent at 1 and 10 μM doses in the MDA-MB-231 cell line, but showed the same cell line selectivity as 5hmdC (Fig. 2b).

Figure 2: Identification of CDA overexpression as a primary determinant for cytotoxic action of natural cytidine variants.
figure 2

a, Ratio of live cells after 10 days of treatment with 5hmdC versus dC (n = 3). b, Growth curves of MDA-MB-231 and H1299 cell lines treated with three different concentrations of dC, 5hmdC and 5fdC over a period of 10 days (n = 3). c, Volcano plot illustrating analysis of gene expression of MDA-MB-231 and HOP-92 versus MCF-7 and MDA-MB-435 cell lines. Shown in orange are the genes that are significantly different between the groups (P < 0.01; >2-fold change). Dots in black show nucleoside transporters (SCL29A1, SLC29A2, SLC29A3, SLC29A4, SLC28A1, SLC28A2 and SLC28A3) and kinases (DCK and CMPK1). d, CDA expression levels from NCI60 and CCLE global gene expression data sets for different cancer cell lines. e, Western blot confirming CDA overexpression in SN12C and Capan-2 cell lines. Growth curves of Capan-2 and SN12C after treatment with 10 μM 5hmdC, 5fdC and dC (n = 3). Error bars denote s.d.

PowerPoint slide

Genetic alterations or gene expression differences could modify the response of a cell line to biologically modified cytidine variants. By comparing the existing gene expression profiles of the cell lines (NCI60 and CCLE projects19,20) that we established as sensitive to modified cytidine variants to two randomly chosen resistant ones, we identified 1,380 differentially expressed genes (P < 0.01, >2-fold change). Notably, by focusing on differentially expressed genes known to be involved in nucleoside metabolism, we identified cytidine deaminase (CDA) overexpression in the 5hmdC-sensitive cell types, which had the ninth lowest P value of all the genes (Fig. 2c and Supplementary Table 1). None of the other known genes involved, either in nucleoside transport or cytidine recycling, were differentially expressed (Fig. 2c). To identify other cell lines with CDA overexpression, we ranked the 21 available cell lines according to their CDA messenger RNA levels (Fig. 2d). SN12C and Capan-2 cell lines had the highest expression levels of CDA, and this was confirmed at the protein level by western blot (Fig. 2e). Examination of 5hmdC and 5fdC tolerance revealed that a 10 μM dose substantially inhibited the growth of both cell lines, suggesting that the expression level of CDA is predictive of cytotoxicity for these epigenetic cytidine variants (Fig. 2e).

To determine whether CDA overexpression is necessary for selective cytotoxicity, we manipulated CDA levels in the identified cell lines. Cell lines (MDA-MB-231 and SN12C) with stable short hairpin RNA (shRNA) knockdown of CDA were able to survive 10 μM 5hmdC (Fig. 3a and Extended Data Fig. 2e). Furthermore, stable overexpression of CDA in normally 5hmdC-resistant cell lines (H1299 and MCF-7) induced substantial growth inhibition (Fig. 3b and Extended Data Fig. 2f). These experiments clearly established that CDA overexpression is predictive, necessary and sufficient for cytotoxic activity. In vitro measurements of recombinant CDA protein activity were performed with various cytidine variants. First, we determined that CDA deaminates 5mdC, 5hmdC and 5fdC, but not 5cadC, creating thymidine and respective variants of uridine (Extended Data Fig. 2g, h). Second, reaction kinetic data fitted well with a pseudo zero-order kinetics model (R2 > 0.9) revealing that, after deoxycytidine, the second best substrate (that is, with the second highest turnover number (kcat)) for CDA is 5fdC (Fig. 3c, d and Extended Data Fig. 2h). This was unexpected, because the catalytic activity does not follow a simple relationship with the dimensions of the 5′ modification as it does in the case of AID and APOBEC enzymes21. Molecular docking of cytidine variants to the CDA structure22 suggested that 5fdC docks to the catalytic site with nearly 180° rotation when compared to unmodified cytidine, retaining the amino group position close to the active site containing Zn2+ (Extended Data Fig. 3a). By contrast, 5hmdC docks in the active site by displacing the amino group, which provides a potential explanation for the lower catalytic turnover observed (Extended Data Fig. 3a).

Figure 3: Molecular mechanism of CDA-dependent cytotoxicity of cytidine variants.
figure 3

a, Western blot showing knockdown of CDA by shRNA (using sh-CDA) in the MDA-MB-231 cell line. Right panel illustrates growth curves of derived stable cell lines after treatment with 10 μM 5hmdC (n = 3). (0) and (8) indicate two different shRNA constructs used for the experiments, and sh-luc denotes a non-targeting control shRNA that targets luciferase. WT, wild type. b, Western blot showing overexpression of CDA after lentiviral transduction of H1299 cells with a construct overexpressing CDA (CDA_dsRed). Right panel shows the growth curve after treatment with 10 μM 5hmdC (n = 3). c, CDA activity fitted to the Michaelis–Menten model. Right panel shows a zoomed-in curve, when 5hmdC was used as a substrate. d, kcat values of CDA supplied with cytidine variants. e, TLC separation of reaction products of thymidine kinase 1 (TK1) and thymidylate kinase (DTYMK), which were exposed to different modified uridine substrates. xDP indicates diphosphates; xMP, monophosphates. f, HPLC–UV chromatogram of nucleosides from DNA of MDA-MB-231 cells treated with 10 μM 5hmdC or dC for 3 days. Right panel shows the abundance of 5hmdU relative to T (n = 3, t-test, **P = 0.0057). g, γH2AX immunofluorescence in MDA-MB-231 and H1299 cell lines at day 3 after treatment with 10 μM 5hmdC or dC. Scale bars, 50 μm. Below are quantifications of cells showing positive signals (n = 3 , t-test, P = 0.0017). DAPI, 4′,6-diamidino-2-phenylindole. All error bars denote s.d.

PowerPoint slide

The deamination of dC and 5mdC results in dU and T, which are the normal precursors for thymidine triphosphate synthesis. Conversely, deamination of 5hmdC and 5fdC produces 5hmdU and 5fdU, respectively, which are not canonical nucleosides. When phosphorylated and incorporated into DNA, 5hmdU and 5fdU are toxic to the cells (Extended Data Fig. 3b) as they are recognized as damaged bases and trigger extensive uracil glycosylase activity resulting in DNA breaks23. Therefore, we asked whether the uptake of 5hmdC in CDA-overexpressing cells leads to its conversion into 5hmdU and to its incorporation into DNA, potentially explaining cell-type-specific lethality. First, we determined activities of thymidine kinase and thymidylate kinase on 5hmdU and 5fdU. In contrast to the inability of CMPK1 to act on equivalent cytidine variants, thymidine kinase and thymidylate kinase phosphorylated both uridine variants (Fig. 3e). Notably, the corresponding triphosphates are not substrates nor potent inhibitors of dUTPase, a robust enzyme that removes dUTP from cells (Extended Data Fig. 3c). Finally, analysis of the genomic DNA composition of 5hmdC- and 5fdC-treated MDA-MB-231 cells identified 5hmUra and 5fUra, but no detectable change in 5-hydroxymethylcytosine (5hmCyt) or 5-formylcytosine (5fdCyt) levels in the DNA (Fig. 3f, Extended Data Figs 3d, e and 4a–c). Overall, in all the cell lines examined, a linear correlation was observed between CDA expression and the amount of 5hmUra in the DNA after treatment with 5hmdC (Extended Data Fig. 4d). Signs of extensive DNA damage were detected by phosphorylated H2AX (γH2AX) staining in 5hmdC-treated CDA-overexpressing cells (MDA-MB-231). By contrast, a cell line expressing low CDA levels (H1299) had no obvious γH2AX staining (Fig. 3g and Extended Data Fig. 5c, d). Also, increased numbers of cells in S and G2 phases of the cell cycle were observed in CDA-overexpressing cell lines, consistent with cell cycle arrest triggered by a DNA damage response (Extended Data Fig. 5a). We did not observe deviations in the dNTP pools of treated cells, indicating that the cell death is likely to be caused by extensive base excision by SMUG1 DNA glycosylase, which recognizes 5hmUra and 5fUra triggering repair and DNA double-stranded breaks (Extended Data Fig. 6). Together, these observations demonstrate that CDA deaminates 5hmdC and 5fdC, creating 5hmdU and 5fdU, respectively, which are incorporated into the DNA, leading to cell cycle arrest and eventually death.

CDA overexpression has been linked to resistance to cytidine analogues—such as gemcitabine, cytosine arabinoside or 5-azacytidine—that are currently used in cancer treatment, presenting a major obstacle to their use11,24,25,26. Our observations about biological nucleoside variants demonstrate an opposite effect: CDA overexpression sensitizes cells to otherwise non-toxic 5hmdC and 5fdC. Because cancers originating in the pancreas27, stomach, testis and vagina have upregulated CDA expression28 (Extended Data Fig. 7a, b), we postulated that the administration of 5hmdC and 5fdC could have a selective activity against these tumour cells. We first tested whether cytotoxic activity is cell autonomous for CDA-overexpressing H1299 cells in the presence of wild-type (CDA-low) H1299 cells. Both 5hmdC and 5fdC were able to eliminate CDA-overexpressing cells selectively, suggesting that secreted CDA or 5hmdU is insufficient for cytotoxicity (Fig. 4a). Tolerance to and the stability of 5hmdC and 5fdC in vivo was determined in immunocompromized BALB/cOlaHsd-Foxn1nu/nu mice after they received a range of doses (12.5 to 100 mg kg−1) of 5hmdC and 5fdC by intraperitoneal injection. Half an hour after injection, we were able to detect 5hmdC and 5fdC in the bloodstream, and to quantify 5hmdC using RapidFire mass spectrometry (Extended Data Fig. 7c, d). We observed no adverse effects on behaviour, injection site, weight or histology in the panel of tissues studied, even though some tissues (kidney and intestine) express CDA (Extended Data Fig. 7e–h, data not shown). To determine whether cytidine variants have an effect on tumour growth in proliferating cells, we subcutaneously injected H1299 wild-type and CDA-overexpressing cells into each side of an animal, which was later treated with 5hmdC or 5fdC (Fig. 4b). Xenografts with CDA overexpression grew slightly slower (reaching 64% of wild-type tumour volume), and the volume of the tumour was further reduced twofold in animals treated with 5hmdC or 5fdC (Fig. 4c). CDA-overexpressing tumours showed a twofold decrease in the number of proliferating cells and a threefold increase in the number of cells with DNA damage in 5fdC-injected animals, but smaller differences in animals that were injected with 5hmdC (Fig. 4d). Similar CDA-dependent effects on tumour volume and proliferation were observed when SN12C wild-type and SN12C CDA knockdown cells were used in the xenograft assay (Extended Data Fig. 8).

Figure 4: In vivo evaluation of cytidine variants and the proposed model of epigenetic nucleoside variants in the nucleoside recycling pathway.
figure 4

a, Wild-type and CDA-overexpressing H1299 cells were mixed at equal ratios and exposed to the indicated variants of cytidine. Representative histogram (left) and quantification of the results (right) are shown (n = 3, 10,000 events recorded). Lower concentrations of 5fdC were used to demonstrate higher cytotoxic potency. b, Schematic illustration of xenograft establishment and treatment with nucleoside variants. D, days. c, Volume of tumours, calculated by assuming that tumours were spheres with their diameters measured using Vernier calipers (n = 8 in 5fdC and n = 7 in 5hmdC experiments, two-way analysis of variance (ANOVA) with repeated measures Holm–Sidak correction, P < 0.0001). Dissected tumours are illustrated below. d, Evaluation of proliferation (immunofluorescence, H3PS10) and DNA damage (immunofluorescence, γH2AX) in dissected tumour samples. Scale bar, 50 μm (n = 4, one-way ANOVA, H3PS10: **P = 0.0057, γH2AX: *P = 0.0491 (5hmdC versus PBS), ***P = 0.0001 (5fdC versus PBS)). Error bars denote s.d. e, Model of metabolism of epigenetic nucleoside variants.

PowerPoint slide

Here, we have characterized the metabolism of newly discovered biologically modified nucleosides, leading to a model in which the selectivity of CMPK1 prevents random incorporation of modified cytosines (Fig. 4e). Notably, we have discovered that 5hmdC and 5fdC, but not 5cadC, are deaminated by CDA at different rates, resulting in the formation of cytotoxic 5hmdU and 5fdU. Our data on oxidized epigenetic bases are similar to the proposed mechanism of 5mdC salvage, in which CMPK1 is rate-limiting in the production of the diphosphate, whereas 5mdC deamination produces a normal T (refs 4, 5, 6). We did not observe any adverse effects during the administration of 5hmdC and 5fdC in mice, presumably because the cytotoxic threshold is only reached in highly proliferating and CDA-overexpressing cells, in which there is substantial incorporation of nucleoside variants in the DNA, reflected by the CDA-dependent regression of xenografts. Together with recent publications demonstrating the importance of and therapeutic opportunities targeting MTH1, which surveys damaged nucleosides29,30, our data extend the current understanding of the metabolism of biological cytidine variants and provide a novel avenue for cancer therapy.

Methods

Purification of DCK, CMPK1, CDA, TMPK and DUT

Human DCK with a carboxy-terminal 6×His tag was cloned in pET28a(+) and expressed in Escherichia coli BL21 RIPL (Life Technologies) for 4 h at 37 °C following induction with 1 mM isopropyl β-d-1-thiogalactopyranoside (IPTG) in LB. The bacterial pellet was resuspended in 50 mM sodium phosphate, pH 8, 300 mM NaCl and protease inhibitors (Complete EDTA-free, Roche). The protein was bound to a Hi-TRAP HP 5 ml column (GE Healthcare) and eluted with a linear gradient of 0–500 mM imidazole in the lysis buffer, supplemented with 10% glycerol. The fractions were assessed by electrophoresis and ones containing the protein were pooled, concentrated with Amicon 3-kDa centrifugal filter units (Millipore) and separated on a HiPrep 16/60 Sephacryl S-200 gel filtration column (GE Healthcare). The protein was again concentrated using Amicon columns, supplemented with final 10 mM dithiothreitol (DTT) and 40% of glycerol, snap-frozen and stored in aliquots at −80 °C. Human CMPK1 was tagged at the C terminus with 6×His and purified using a similar workflow to DCK with the following exceptions: the lysis buffer was 50 mM Tris, pH 7.5, 10 mM NaCl and protease inhibitors (Complete Mini, Roche); after the gel-filtration step the protein was bound to an anion exchange column HiTrap Q HP 5 ml (GE Healthcare) and eluted with a 20-column-volumes linear gradient of 0–1 M NaCl. The salt was removed by dialysis in 50 mM Tris, pH 8, the protein concentrated and 10 mM DTT added to the final preparation, before storage in 40% glycerol at −80 °C. C-terminal 6×His-tagged CDA was purified in a similar workflow to CMPK1 with the following modifications: the protein was expressed for 19 h at 37 °C; the lysis buffer was 50 mM Tris, pH 7.5, 1 mM DTT, 1 mM EDTA and protease inhibitors (Complete Mini, Roche); following HiTRAP purification, 6×His tag was cleaved by thrombin (Sigma); cleaved tag and uncleaved protein was removed by separation using a HiTRAP HP 5 ml column and collection of the flow-through. Subsequently, the protein was purified using gel filtration as indicated above and stored in aliquots at −80 °C. Protein purity was assessed by electrophoresis and CDA was additionally identified by mass spectrometry. Human TMPK1 with a C-terminal 6×His tag was cloned in pET28a(+) and expressed in E. coli BL21 RIPL (Life Technologies) for 4 h at 37 °C following induction with 1 mM IPTG in LB. The bacterial pellet was resuspended in 50 mM sodium phosphate, pH 7, 300 mM NaCl and protease inhibitors (Complete EDTA-free, Roche) and lysed with a French press (EmulsiFlex C5, Avestin) at 100 MPa equipped with a recirculating cooler (F250, Julabo) set at 4 °C. The protein was bound to a Hi-TRAP HP 5 ml column (GE Healthcare) and eluted with a linear gradient of 0–500 mM imidazole in the lysis buffer, supplemented with 10% glycerol. The fractions were assessed by electrophoresis and ones containing the protein were pooled, concentrated with Amicon 3 kDa centrifugal filter units (Millipore), supplemented with 40% of glycerol, snap frozen in aliquots and stored at −80 °C. Human DUT with a C-terminal 6×His tag was cloned in pET28a(+) and expressed in E. coli BL21 RIPL (Life Technologies) for 4 h at 37 °C following induction with 0.2 mM IPTG in LB. The bacterial pellet was resuspended in 20 mM sodium phosphate pH 7.3, 150 mM NaCl, 1% Triton X-100 and protease inhibitors (Complete EDTA-free, Roche) and lysed with a French press (EmulsiFlex C5, Avestin) at 100 MPa equipped with a recirculating cooler (F250, Julabo) set at 4 °C. The protein was bound to a Hi-TRAP HP 5 ml column (GE Healthcare) and eluted with a linear gradient of 0–500 mM imidazole in the lysis buffer, supplemented with 10% glycerol. The fractions were assessed by electrophoresis and ones containing the protein were pooled, concentrated with Amicon 3 kDa centrifugal filter units (Millipore), supplemented with 40% of glycerol, snap frozen and stored in aliquots at −80 °C. Thymidine kinase was purchased and the purity assessed by SDS–PAGE (8180-TK-050, R&D Systems).

Nucleoside stability

Nucleosides were obtained from the following sources: 5hmdC (PY-7588, Berry & Associates), 5fdC (PY-7589, Berry & Associates), 5cadC (PY-7593, Berry & Associates), 5azadC (A3656, Sigma Aldrich), ATP solution (Thermo Fisher), [γ-32P]ATP (Perkin Elmer), dC (Sigma Aldrich, D3897), dCMP (Sigma Aldrich, D7625), 5hmdCTP (Bioline, BIO-39046). 100 μM solutions of 5hmdC, 5fdC and 5azadC were prepared in HPLC-grade water (Thermo Fisher) or in DMEM (Lonza). The solutions were incubated at 37 °C for 10 days. A sample was taken every 24 h and subjected to HPLC–UV analysis.

Enzyme assays

The substrate selectivity of DCK and CMPK1 kinases were measured by 32P transfer and detection using 1D or 2D TLC. 1 μg of DCK was incubated in 100 mM Tris, pH 7.5, 100 mM KCl, 10 mM MgCl2, 1 mM [γ-32P]ATP and 200 μM of the respective nucleoside in a 50 μl reaction volume at 37 °C for 2 h. 1 μl of products was separated via 2D TLC on glass-backed AVICEL cellulose plates (Analtech) as described31. CMPK1 was assayed through a coupled assay with DCK following the conditions described previously32 with 1 μg DCK, 1 μg CMPK1 and 1 mM substrate. Thymidine kinase (8180-TK-050, R&D Systems) and TMPK1 were assayed through a coupled assay with 1 μg thymidine kinase, 1 μg TMPK1 and 1 mM substrate in 50 mM Tris, pH 7.4, 50 mM KCl, 5 mM MgCl2, 1 mM ATP and 2.5 μCi [γ-32P]ATP at 37 °C. 1D TLC was performed using glass-backed TLC sheets (PEI cellulose F, Millipore) as described previously32. The plates were exposed to a storage phosphor screen (GE Heathcare), which was scanned using Phosphoimager (Biorad) and images analysed with ImageLab software (Biorad). CDA kinetic activity data was collected as described33 by monitoring the absorbance at 260 nm with a spectrophotometer (SpectraMax M2, Molecular devices) using 45 ng of enzyme (500 ng for 5hmdC) and the data fitted according to pseudo zero order Michaelis–Menten enzyme kinetic model by Prism software (GraphPad). 1 μg DUT was assayed in 50 mM Tris, pH 7.5, 4 mM MgCl2, 1 mM DTT, 0.1 mg ml−1 BSA with 5 μM of substrate in 40 μl reaction volume for 10 min at 37 °C. The generated pyrophosphate was detected with a bioluminescent coupled assay (PPiLight inorganic pyrophosphate assay LT07-500, Lonza). The plate was then read in a GloMax instrument (Promega).

Molecular docking

A tetramer was generated with CDA structure 1MQ0 (ref. 22) and subject to DockPrep in Chimera 1.8 (http://www.cgl.ucsf.edu/chimera). Substrates were dC (ZINC18286013)34, 5hmdC (ZINC77300654)34 and 5fdC (CSID:10291642) (http://www.chemspider.com) downloaded as .mol files and subjected to .mol2 files conversion in Chimera. Docking was subsequently performed with SwissDock (http://www.swissdock.ch/docking)35. The model with the lowest ΔG of ligand was then visualized and analysed with Chimera.

In vitro replication assay

The assay was carried out following protocols for nuclear extract and cytoplasmic fraction preparation and for the replication assay13,36. The reaction contained 0.3 mM of each canonical nucleotide, except dCTP which was substituted by 5hmdCTP. The reaction was stopped with the addition of 0.1 M EDTA final. DNA was extracted with phenol and chloroform, treated with RNase A/T1 (Thermo Fisher) and free nucleotides removed with a Mini Quick Spin DNA column (Roche) before HPLC assay.

DNA glycosylase assay

The single stranded DNA oligonucleotide substrates (5′-FAM CATAAAGTGXAAAGCCTGGA, in which X denotes uracil, 5hmUra or 5fUra) were purchased from AtdBio and their complementary strand from IDT (all HPLC purified). Recombinant human SMUG1 (NEB) was incubated with annealed oligonucleotides as described before37. The reaction products were resolved on a 15% denaturing polyacrylamide TBE–urea gel (Invitrogen) and quantified using ChemiDoc (BioRad) with blot detection protocol for Alexa 488.

Quantification of nucleosides by HPLC

Genomic DNA was extracted with Gene Jet Genomic DNA extraction Kit (Thermo Fisher) or TRI Reagent (Sigma Aldrich), incubated with RNase A/T1 (Thermo Fisher) in buffer 2 (NEB), phenol/chloroform extracted and precipitated with ethanol. 1–10 μg of DNA was hydrolysed as described before7. Nucleosides were resolved with an Agilent UHPLC 1290 instrument fitted with Eclipse Plus C18 RRHD 1.8 μm, 2.1 × 150 mm column and detected with Agilent 1290 DAD fitted with a Max-Light 60 mm cell. Buffer A was 100 mM ammonium acetate, pH 6.5; buffer B was 40% acetonitrile, and the flow rate 0.4 ml min−1. The gradient was between 1.8–100% of 40% acetonitrile with the following steps: 1–2 min, 100% A; 2–16 min 98.2% A, 1.8% B; 16–18 min 70% A, 30% B; 18–20 min 50% A, 50% B; 20–21.5 min 25% A, 75% B; 21.5–24.5 min 100% B.

Quantification of nucleotides by HPLC

MDA-MB-231 and H1299 cells were treated with 10 μM dC, 10 μM 5hmdC and 1 μM 5fdC. Metabolites were extracted at day 3 as described before38. In brief, cells were washed in PBS and scraped on ice. The pellet was washed again in cold PBS and extraction was done with 50 μl of ice-cold 50% ACN per mg of pellet. The samples were vortexed and incubated on ice for 10 min. Insoluble material was pelleted at 20,000g for 10 min and supernatants were dried using a SpeedVac (Thermo Scientific). Metabolites were dissolved in 30 μl of buffer A and 20 μl was used for chromatography. HPLC was performed as described38 with some minor modifications as listed below. Nucleotides were resolved with an Agilent UHPLC 1290 instrument fitted with Eclipse Plus C18 RRHD 1.8 μm, 2.1 × 150 mm column and detected with Agilent 1290 DAD fitted with a Max-Light 60 mm cell at 254, 260 and 280 nm. Buffer A consisted of 100 mM KH2PO4 (60221, Sigma) with 8 mM tetrabutylammoniumbisulfate (98479, Sigma) set at pH 5.5. Buffer B consisted of buffer A with 25% methanol. After 8 min at 0% buffer B, the gradient started with a linear increase of buffer B to 35% in 19 min, followed by a linear increase from 35% to 38% buffer B in 5 min and from 38% to 100% buffer B in 22 min. After an 8-min hold at 100% buffer B, the gradient was reversed from 100% to 0% buffer B in 2 min, followed by a hold at 0% buffer B for 2 min. The column temperature was set at 30 °C and the flow rate was 0.4 ml min−1. The compounds were identified by comparing their retention times and their UV spectra with those of known standards, which were purchased from Sigma-Aldrich. The integrated area was used to quantify the relative abundance of nucleotides by normalizing each peak area to the ADP area as an indication of loaded amount.

Cell culture and transfections

Cell lines were routinely tested for mycoplasma contamination using Lonza Mycoalert kit. Cell proliferation assays were done by seeding cells in p60 plates or in a T25 flask with appropriate concentrations of 5hmdC, 5fdC or dC in the growth media. The cells were passaged, counted and the media was replaced every 2 days. Before counting, 1 volume of Trypan blue solution (Lonza) was added to an aliquot of single cell suspension. The live cells were counted by TC-20 Cell Counter (Bio-Rad). NTPs were introduced by nucleofection. One million MDA-MB-231 cells were nucleofected with 50 mM 5hmdC in a 100-µl volume using an Amaxa nucleofector kit (Lonza), following the manufacturer’s instructions. After transfection, cells were seeded in a 6-well plate, 24 h later washed twice with PBS, and 48 h later DNA extracted for HPLC analysis.

Production of stable cell lines

Stable cell lines were generated via lentiviral infection using a standard protocol39 with second generation packaging plasmids (pCMV-VSVG, pCMV-dR8.9, a gift from B. Amati). CDA knockdown was achieved by infecting MDA-MB-231 and SN12C cell lines with pLKO.1 vectors containing five different shRNA constructs (SHCLND-NM_001785, Sigma-Aldrich) and a control pLKO.1 containing shRNA silencing luciferase (a gift from X. Lu). Infected cells were selected by incubation with 1.5 μg ml−1 puromycin (Sigma) for 60 h. Two cell lines with the lowest CDA mRNA levels (shRNA TRCN0000051290 and TRCN0000051288, designated (0) and (8), respectively) were further assessed by immunoblotting and used for experiments. Lentivirus for CDA overexpression was generated with pLenti-puro (39481, Addgene, I.-M. Shih laboratory) expressing dsRed-IRES-CDA. H1299 and MCF-7 were infected as above. Infected cells were selected with puromycin at 2 μg ml−1 for 60 h.

Immunoblotting, FACS and immunofluorescence

For western blot analysis, 106 cells were lysed with RIPA buffer (20 mM HEPES at pH 7.5, 300 mM NaCl, 5 mM EDTA, 10% glycerol, 1% Triton X-100, supplemented with protease inhibitors (Complete EDTA-free, Roche)) and sonicated. Cleared lysates were electrophoresed and immunoblotted with the following primary antibodies: anti-CDA (Sigma, SAB1300717 1:250), anti-actin (Abcam, ab185058 1:75000). Chemiluminescent detection, after incubation of the membranes with appropriate secondary antibodies, was done through a CCD camera using the ChemiDoc System (Bio-Rad) with Image Lab software (Bio-Rad, version 4.0). For FACS analysis, 5 × 105 cells were trypsinized, washed in PBS and fixed in 70% ethanol for 1 h on ice. The pelleted cells were resuspended in 250 μl of staining solution (50 μg ml−1 propidium iodide (P4864, Sigma), 0.1 mg ml−1 RNaseA and 0.05% Triton X-100) and incubated at 37 °C for 40 min. Controls were used for G1 (serum starvation overnight) and G2 (0.1 μg μl−1 nocodazole overnight). Fluorescence of 10,000 cells was recorded with a FACS Canto flow cytometer (BD Biosciences) and analysed using FlowJo software (Version 7.6.5, TreeStar). For immunofluorescence, cells were grown on coverslips and fixed with 4% paraformaldehyde for 20 min at room temperature. Cells were washed twice in PBS and permeabilized for 10 min in 0.2% Triton X-100. After two washes in PBS, cells were blocked for 1 h in 3% BSA (Sigma Aldrich), dissolved in PBS and incubated with γH2A.X antibody (Millipore, 05-636, 1:500) overnight at 4 °C in a humidified chamber. Cells were then washed three times in PBS and incubated with anti-mouse secondary antibody conjugated with Alexa546 (1:400, Life Technologies) and DAPI (Sigma Aldrich). Coverslips were then washed three times in PBS and mounted with mounting media (Vectashield). Tiled pictures were automatically taken with a Zeiss 710 microscope with a 20× lens. The amount of nuclear fluorescence was quantified using ImageJ.

Gene expression analysis and public data sets

Data sets used in the study: Gene Expression Omnibus (GEO) accessions GSE36139 (GPL15308)20 and GSE32474 (GPL570)40. Gene expression analysis comparing was done on the data from the NCI-60 panel19 as follows. Affy HG-U133 Plus 2.0 microarray data was downloaded from CellMiner database (http://discover.nci.nih.gov/cellminer/loadDownload.do) and cel files were extracted for triplicate experiments done on BR:MCF7, ME:MDA_MB_435, BR:MDA_MB_231 and LC:HOP_92 cell lines. Data was then imported into ArrayStar v11 (DNAStar) and signal normalization and intensity correction was done using RMA Quantile method. Experiment was designed by grouping BR:MCF7, ME:MDA_MB_435 cell lines into ‘resistant’ group and BR:MDA_MB_231 and LC:HOP_92 cell lines into ‘sensitive’ group. Differential expression between the groups was determined using the Student’s t-test with Benjamini–Hochberg multiple testing correction. Genes were called as differentially expressed when P < 0.01 and fold change >2. The full data set is included in Supplementary Table 1. To derive CDA expression values in tumours, GPL15308 (ref. 20) and GPL570 (ref. 40) were analysed directly on the NCBI portal with GEO2R. P values were adjusted with Benjamini–Hochberg correction.

Toxicology and dose determination in animal experiments

Animal work was done after approval by the UK Home Office and University of Oxford Local Ethical review. Three 5–7-week-old BALB/cOlaHsd-Foxn1nu/nu (Harlan) mice per dose were injected (intraperitoneally) with 25, 50 and 100 mg kg−1 of 5hmdC and 12.5, 25, 50 and 100 mg kg−1 of 5fdC. Animals were monitored for any deviations from normal behaviour. At 30 min post-injection, a few drops of blood were collected through tail vein bleeding using Microvette CB300 (Sarstedt) to assess the amounts of the compounds in the bloodstream.

RapidFire mass spectrometry analysis of serum samples

Serum was isolated by centrifugation of Microvettes according to the recommendations of the manufacturer (Sarstedt). The samples were brought up to 200 μl with water and three volumes of methanol, and 150 μl of chloroform was added. After intense vortexing, 450 μl of water was added, samples were vortexed again and centrifuged at 14,000g for 1 min. The aqueous phase containing the soluble molecules was collected and dried in a Speedvac (Thermo Scientific). The dried pellets were then resuspended in 10 μl water, then 3 μl diluted further into 50 μl of water to load on a RapidFire 360 high throughput sample delivery system coupled to a 6530 quadrupole time-of-flight (QTOF) mass spectrometer (Agilent). The samples were aspirated by vacuum at −40 bar for 400 ms into a 10-μl sample loop and loaded onto a graphitized carbon solid phase extraction cartridge running buffer 5 mM ammonium formate at a flow of 1.5 ml min−1. The matrix components not retained on the cartridge were diverted to waste for 4,500 ms, and the retained components eluted with 95% acetonitrile 5 mM ammonium formate for 4,500 ms at a flow of 1 ml min−1. The SPE was then re-equilibrated for 4,500 ms with 5 mM ammonium formate. Data were collected in positive electrospray ionisation (ESI) mode using a 2 Gb data configuration, gas temperature 300 °C, drying gas 8 l min−1, nebuliser gas 240 MPa, Vcap 3,500 V and fragmentor voltage 175 V. The amount of nucleoside was measured against a standard curve produced by dissolving known amounts of 5hmdC and 5fdC in serum and processed as indicated above. Data were analysed using an Agilent Mass Hunter Qualitative (vB.06) and Quantitative (vB.05) analysis software. Standard curve analysis was determined by using a quadratic curve fit algorithm for each nucleoside with an R2 > 0.98 in all instances.

Nucleoside analysis by mass spectrometry (HPLC–QTOF)

Samples were dried in a speed-vac and re-suspended in 10 µl of water. For the analysis by HPLC–QTOF mass spectrometry, a 1290 Infinity UHPLC was fitted with a BEH C18 XP Column, (130Å, 1.7 µm, 2.1 mm × 150 mm; Waters) and coupled to a 6560 Ion mobility QTOF LC/MS mass spectrometer (Agilent Technologies) equipped with a Jetstream ESI-AJS source. The data were acquired in QToF mode using positive electrospray ionisation (ESI+). Two reference ions, m/z 121.0508 and 922.0097 were used as internal standards. The Dual AJS ESI settings were as follows: gas temperature: 150 °C, the drying gas: 5 l min−1, nebulizer 240 MPa, sheath gas temperature 360 °C, sheath gas flow 12 l min−1, Vcap 4,000 V and nozzle voltage 300 V. The fragmentor of the mass spectrometer TOF was set to 275 V.

The gradient used to elute the nucleosides started by a 1-min isocratic gradient composed with 99.5% buffer A (10 mM ammonium acetate, pH 6) and 0.5% buffer B (composed of 40% CH3CN) with a flow rate of 0.350 ml min−1 and was followed by the subsequent steps: 1–2 min, 98.2% A; 2–16 min 80% A; 16–18 min 50% A; 18–20 min 25% A; 20.20–21.5 min 0% A; 21.5–22.5 min 100% B; 22.5–24.5 min 99.5% B. The gradient was followed by a 5 min post time to re-equilibrate the column.

The raw mass spectrometry data was analysed using the MassHunter Qual Software package (Agilent Technologies, version B7.0), and the masses/retention times used for the characterization of nucleosides and their adducts are summarized in Supplementary Table 2. For the identification of compounds, raw mass spectrometry data were processed using the molecular feature extraction function in the MassHunter software, followed by metabolite searching through mass/isotope matching using the PCDL software (version B.07.00 build 7024.0) and the METLIN database (https://metlin.scripps.edu/index.php). For each nucleoside, precursor ions corresponding to the M+H, M+Na, M+K, 2M and base only species were extracted, and the most intense ion species observed for each nucleoside was used for quantification. Identities of peaks eluting at 4.5 and 5.1 min (Figs 1b and 3f) are shown in Extended Data Figs 9 and 10.

Subcutaneus xenografts

Animal work was done after approval by the UK Home Office and University of Oxford Local Ethical review. Power calculations suggested 9–6 animals per group if we were to observe a significant 50% difference in tumour size with power of 90% and s.d. between 40 and 30%. One million cells in a 50% suspension of MatriGel (200 μl) were injected into 5–7-week-old BALB/cOlaHsd-Foxn1nu/nu (Harlan) mice, 8 animals per group in each flank following the scheme: SN12C/H1299 left, SN12C shCDA8/H1299 dsRedCDA right. When the tumours reached palpable size, 8 mice were assigned randomly to each treatment group: PBS, 100 mg kg−1 of 5hmdC and 100 mg kg−1 5fdC. The compounds were administered every 72 h (four doses in total). Tumour size was measured every 3 days by Vernier caliper and the animal cohort euthanized when the cumulative tumour diameter in the first animal reached 12 mm. The experimenter was unaware of the cell line genotypes during the measurements. Tumour volume was calculated assuming that the tumours were spheres with the following formula: 4/3π (D/2)3, in which D represents the diameter of the tumour.

Histology

Organs and tumours were collected and immediately fixed in 10% formalin for 48 h. They were then embedded in wax and 4-μm thick sections cut. All sections were stained with H&E. Tumours were additionally stained with a Masson’s Trichrome Stain Kit (Sigma Aldrich) according to the manufacturer instructions.

Immunofluorescence of tissues and tumours

The 4-μm thick sections were subjected to antigen retrieval with a pressure cooker in Tris buffer, pH 9 (10 mM Tris base, 0.05% Tween 20). They were then blocked in 3% BSA in PBS for 30 min and incubated overnight in a humidified chamber at 4 °C with the following antibodies: γH2A.X (Millipore, 05-636, 1:200) and PH3 (Millipore, 06-570, 1:200) or β-catenin (BD Transduction Laboratories, 610153, 1:250) and CDA (Sigma Aldrich, SAB1300717, 1:100). The slides were then washed vigorously three times in PBS and incubated for 1 h at room temperature with an appropriate secondary antibody, Alexa546 and Alexa488 conjugated (1:400, Life Technologies) and DAPI (Sigma Aldrich). Coverslips were then washed three times in PBS and mounted with mounting media (Vectashield). Images were acquired with a Zeiss 710 confocal microscope with a ×20 objective. For quantification of DNA damage and proliferation in tumours, tiled images with Z stacks were acquired to cover the entire central section of the tumour. Image J was used to quantify the immunofluorescence signal.