Abstract
DNA immunoprecipitation followed by sequencing (DIP-seq) is a common enrichment method for profiling DNA modifications in mammalian genomes. However, the results of independent DIP-seq studies often show considerable variation between profiles of the same genome and between profiles obtained by alternative methods. Here we show that these differences are primarily due to the intrinsic affinity of IgG for short unmodified DNA repeats. This pervasive experimental error accounts for 50–99% of regions identified as ‘enriched’ for DNA modifications in DIP-seq data. Correction of this error profoundly altered DNA-modification profiles for numerous cell types, including mouse embryonic stem cells, and subsequently revealed novel associations among DNA modifications, chromatin modifications and biological processes. We conclude that both matched input and IgG controls are essential in order for the results of DIP-based assays to be interpreted correctly, and that complementary, non-antibody-based techniques should be used to validate DIP-based findings to avoid further misinterpretation of genome-wide profiling data.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Goll, M. G. & Bestor, T. H. Eukaryotic cytosine methyltransferases. Annu. Rev. Biochem. 74, 481–514 (2005).
Bogdanović, O. et al. Active DNA demethylation at enhancers during the vertebrate phylotypic period. Nat. Genet. 48, 417–426 (2016).
Feinberg, A. P. & Tycko, B. The history of cancer epigenetics. Nat. Rev. Cancer 4, 143–153 (2004).
Illingworth, R. S. et al. Orphan CpG islands identify numerous conserved promoters in the mammalian genome. PLoS Genet. 6, e1001134 (2010).
Weber, M. et al. Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nat. Genet. 37, 853–862 (2005).
Harris, R. A. et al. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nat. Biotechnol. 28, 1097–1105 (2010).
Bock, C. Analysing and interpreting DNA methylation data. Nat. Rev. Genet. 13, 705–719 (2012).
Bock, C. et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat. Biotechnol. 28, 1106–1114 (2010).
Nair, S. S. et al. Comparison of methyl-DNA immunoprecipitation (MeDIP) and methyl-CpG binding domain (MBD) protein capture for genome-wide DNA methylation analysis reveal CpG sequence coverage bias. Epigenetics 6, 34–44 (2011).
Ko, M. et al. Impaired hydroxylation of 5-methylcytosine in myeloid cancers with mutant TET2. Nature 468, 839–843 (2010).
Matarese, F., Carrillo-de Santa Pau, E. & Stunnenberg, H. G. 5-Hydroxymethylcytosine: a new kid on the epigenetic block? Mol. Syst. Biol. 7, 562 (2011).
Thomson, J. P. et al. Comparative analysis of affinity-based 5-hydroxymethylation enrichment techniques. Nucleic Acids Res. 41, e206 (2013).
Skvortsova, K. et al. Comprehensive evaluation of genome-wide 5-hydroxymethylcytosine profiling approaches in human DNA. Epigenetics Chromatin 10, 16 (2017).
Pastor, W. A., Huang, Y., Henderson, H. R., Agarwal, S. & Rao, A. The GLIB technique for genome-wide mapping of 5-hydroxymethylcytosine. Nat. Protoc. 7, 1909–1917 (2012).
Shen, L. et al. Genome-wide analysis reveals TET- and TDG-dependent 5-methylcytosine oxidation dynamics. Cell 153, 692–706 (2013).
Habibi, E. et al. Whole-genome bisulfite sequencing of two distinct interconvertible DNA methylomes of mouse embryonic stem cells. Cell Stem Cell 13, 360–369 (2013).
Ramsahoye, B. H. et al. Non-CpG methylation is prevalent in embryonic stem cells and may be mediated by DNA methyltransferase 3a. Proc. Natl. Acad. Sci. USA 97, 5237–5242 (2000).
Williams, K. et al. TET1 and hydroxymethylcytosine in transcription and DNA methylation fidelity. Nature 473, 343–348 (2011).
Dawlaty, M. M. et al. Loss of Tet enzymes compromises proper differentiation of embryonic stem cells. Dev. Cell 29, 102–111 (2014).
Song, C. X. et al. Genome-wide profiling of 5-formylcytosine reveals its roles in epigenetic priming. Cell 153, 678–691 (2013).
Papin, C. et al. Combinatorial DNA methylation codes at repetitive elements. Genome Res. 27, 934–946 (2017).
Pastor, W. A. et al. Genome-wide mapping of 5-hydroxymethylcytosine in embryonic stem cells. Nature 473, 394–397 (2011).
Traube, F. R. & Carell, T. The chemistries and consequences of DNA and RNA methylation and demethylation. RNA Biol. 14, 1099–1107 (2017).
Fu, Y. et al. N 6-methyldeoxyadenosine marks active transcription start sites in Chlamydomonas. Cell 161, 879–892 (2015).
Greer, E. L. et al. DNA methylation on N 6-adenine in C. elegans. Cell 161, 868–878 (2015).
Zhang, G. et al. N 6-methyladenine DNA modification in Drosophila. Cell 161, 893–906 (2015).
Koziol, M. J. et al. Identification of methylated deoxyadenosines in vertebrates reveals diversity in DNA modifications. Nat. Struct. Mol. Biol. 23, 24–30 (2016).
Wu, T. P. et al. DNA methylation on N 6-adenine in mammalian embryonic stem cells. Nature 532, 329–333 (2016).
Liu, J. et al. Abundant DNA 6mA methylation during early embryogenesis of zebrafish and pig. Nat. Commun. 7, 13052 (2016).
Yao, B. et al. DNA N 6-methyladenine is dynamically regulated in the mouse brain following environmental stress. Nat. Commun. 8, 1122 (2017).
Schiffers, S. et al. Quantitative LC-MS provides no evidence for m6 dA or m4 dC in the genome of mouse embryonic stem cells and tissues. Angew. Chem. Int. Ed. Engl. 56, 11268–11271 (2017).
Luo, G. Z. & He, C. DNA N 6-methyladenine in metazoans: functional epigenetic mark or bystander? Nat. Struct. Mol. Biol. 24, 503–506 (2017).
O’Brown, Z. K. & Greer, E. L. N 6-methyladenine: a conserved and dynamic DNA mark. Adv. Exp. Med. Biol. 945, 213–246 (2016).
Razin, A. & Razin, S. Methylated bases in mycoplasmal DNA. Nucleic Acids Res. 8, 1383–1390 (1980).
Lluch-Senar, M. et al. Comprehensive methylome characterization of Mycoplasma genitalium and Mycoplasma pneumoniae at single-base resolution. PLoS Genet. 9, e1003191 (2013).
Ficz, G. et al. Dynamic regulation of 5-hydroxymethylcytosine in mouse ES cells and during differentiation. Nature 473, 398–402 (2011).
Xu, Y. et al. Genome-wide regulation of 5hmC, 5mC, and gene expression by Tet1 hydroxylase in mouse embryonic stem cells. Mol. Cell 42, 451–464 (2011).
Yue, F. et al. A comparative encyclopedia of DNA elements in the mouse genome. Nature 515, 355–364 (2014).
Brown, S. J., Stoilov, P. & Xing, Y. Chromatin and epigenetic regulation of pre-mRNA processing. Hum. Mol. Genet. 21, R1. R90–R96 (2012).
Luo, G. Z., Blanco, M. A., Greer, E. L., He, C. & Shi, Y. DNA N 6-methyladenine: a new epigenetic mark in eukaryotes? Nat. Rev. Mol. Cell Biol. 16, 705–710 (2015).
Gebhard, C. et al. General transcription factor binding at CpG islands in normal cells correlates with resistance to de novo DNA methylation in cancer cells. Cancer Res. 70, 1398–1407 (2010).
Nezlin, R. Aptamers in immunological research. Immunol. Lett. 162, 252–255 (2014).
Waring, M. & Britten, R. J. Nucleotide sequence repetition: a rapidly reassociating fraction of mouse DNA. Science 154, 791–794 (1966).
Tsumura, A. et al. Maintenance of self-renewal ability of mouse embryonic stem cells in the absence of DNA methyltransferases Dnmt1, Dnmt3a and Dnmt3b. Genes Cells 11, 805–814 (2006).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
Daley, T. & Smith, A. D. Predicting the molecular complexity of sequencing libraries. Nat. Methods 10, 325–327 (2013).
Aronesty, E. Comparison of sequencing utility programs. Open Bioinforma. J. 7, 1–8 (2013).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Tan, G. & Lenhard, B. TFBSTools: an R/bioconductor package for transcription factor binding site analysis. Bioinformatics 32, 1555–1556 (2016).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016).
Merten, O. W. Virus contaminations of cell cultures—a biotechnological view. Cytotechnology 39, 91–116 (2002).
Drexler, H. G. & Uphoff, C. C. Mycoplasma contamination of cell cultures: incidence, sources, effects, detection, elimination, prevention. Cytotechnology 39, 75–90 (2002).
Mahmood, A. & Ali, S. Microbial and viral contamination of animal and stem cell cultures: common contaminants, detection and elimination. J. Stem Cell Res. Ther. 2, 00078 (2017).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Mi, H., Muruganujan, A. & Thomas, P. D. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res. 41, D377–D386 (2013).
R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria, 2008).
Acknowledgements
This work was supported by the Swedish Research Council (2015-03495 to C.E.N.; 2015-02575 to M.B.), LiU-Cancer (2016-007 to C.E.N.), the Swedish Cancer Society (CAN 2017/625 to C.E.N.; CAN 2016/602 to H.G.) and the Medical Research Council, UK (MC_PC_U127574433 to R.R.M. and H.K.M.).
Author information
Authors and Affiliations
Contributions
C.L., S.V., K.D. and H.K.M. performed experiments; A.L., C.E.N. and S.V. analyzed data; A.L., R.R.M. and C.E.N. wrote the manuscript; and H.V., H.G., R.R.M., M.B. and C.E.N. supervised the work.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Supplementary Figure 1 Reproducibility of off-target binding in DIP-seq between studies.
Signal track for multiple marks and tissues in mice over repetitive regions. STRs, short tandem repeats. All tracks are automatically scaled.
Supplementary Figure 2 Extended identification and validation of off-target binding in DIP-seq.
(a, b) Immuno dot blot n = 1 (a) and ELISA n = 3 biologically independent experiments (b) of 5mC, 5hmC, 5fC and 5caC antibodies in synthetic 426 bp oligos containing the different marks. Boxplots represent median and first and third quartiles with whiskers extending 1.5 * inter-quartile range. (c) Enrichment of IgG or Input reads over the intersection of DIP-seq 5modC (5mC+5hmC+5fC+5caC) n = 592 enriched regions or non-intersecting (5mC/5hmC/5fC/5caC) n = 259002 enriched regions. Data represented as in b. P-values calculated using two-tailed T-test. (d) Correlation matrix of enriched DIP-seq regions per Mbp of mm9. Correlation was calculated as pairwise two-tailed Pearson correlation r2 for each n = 1 biologically independent experiment. (e) Venn diagram of overlapping enriched regions for 5hmC, 5mC and IgG (left). Dinucleotide frequencies for overlapping IgG+5mC+5hmC n = 23317 regions, 5mC+5hmC, n = 6683 regions and mm9 n = 23317 randomly sampled regions. Data represented as in b. (f) Number of methylated CpH from WGBS data per IgG n = 137557 enriched region or 5mC n = 19091 enriched region. P-values calculated using two-tailed Mann-Whitney U-test. (g) Enrichment profile of IgG and 5hmC in DnmtTKO (left) or TetTKO (right) and WT mESCs over IgG n = 137557 enriched regions. Data shown as mean for WT and DnmtTKO n = 1 biologically independent sample (left) and mean and 95% confidence intervals for WT n = 2 and TetTKO n = 3 biologically independent samples (right). (h) DIP using a 5hmC antibody in wild-type (WT) (left) and DnmtTKO (right) mESCs for DIP-qPCR n = 3 and DIP-seq n = 1 biologically independent samples. Data shown as mean ±s.d. Correlation between mean DIP-qPCR and DIP-seq values calculated using two-tailed Spearman correlation. (i) CG content of enriched fragments for DIP and Seal profiling for 5hmC (left) and 5fC (right). Theoretical normal distribution modelled based on mean and s.d. for each mark (Norm). P-values calculated by two-tailed Kolmogorov-Smirnov test using the mean of n = 2 biologically independent experiments. (j) Estimation of PCR duplication for sequencing libraries at a depth of 10 million reads shown as the non-redundant fraction (ie. not duplicated fraction) for n = 2 biologically independent samples. Data represented as in b.
Supplementary Figure 3 Extended analysis of 6mA DIP-seq in multiple species.
(a) Scatterplot showing correlation between IgG motif similarity for DIP-seq and percentage CA-repeats in the respective genomes for M. musculus n = 11, D. rerio n = 2, X. laevis n = 8, C. elegans n = 1 and E. coli n = 2 biologically independent samples. Correlation calculated as two-tailed Spearman's rho for all samples together (n = 24). Line represents linear correlation and 95% confidence interval. (b) Number and overlap of 6mA enriched regions in X. laevis testes identified using Input or IgG controls shown as Venn diagrams (left) and bar plots (right). (c) Number of reads mapping to Mycoplasma species. Kidney n = 3, mESC n = 2, Brain n = 6 biologically independent samples. Boxplots represent median and first and third quartiles with whiskers extending 1.5 * inter-quartile range.
Supplementary Figure 4 Effect of IgG correction in DIP-seq data.
(a) Schematic visualization of false positive rate for enriched regions. Briefly, false positive rate (FPR) was estimated based on the inverse fraction of regions identified by both Input and IgG versus total regions. (b) Estimated false positive rate of enriched regions using IgG or Input as control for Tdg knockdown mESCs for n = 2 biologically independent samples. Data shown as mean. (c) Estimated false positive rate for individual mESC or MEF datasets. *Estimated based on controls from mESCs. (d) Venn diagram of enriched 5hmC regions in mESCs with different techniques and controls of each n = 1 biologically independent samples. (e, f) Fraction of enriched 5modC regions identified using IgG or Input overlapping repetitive elements (e) and dinucleotide repeats (f) for 5caC n = 2, 5fC n = 2, 5hmC n = 7 and 5mC n = 6 biologically independent samples. Presented as mean ± s.d. P-values calculated using two-tailed T-test. (g) Venn diagram of 5mC and 5hmC overlap using IgG or Input controls (top) and paired line plot of 5mC and 5hmC overlap using IgG or Input controls for multiple studies (indicated by symbols, bottom). Data shown as mean and individual data points of n = 6 biologically independent samples. P-values calculated using two-tailed paired T-test. ▲ = ERP000570, ● = GSE31343, ■ = GSE24841, ▼= GSE42250. (h) GO term enrichment for top genes (n = 500) enriched for 5hmC in mouse embryonic fibroblasts (MEFs) using DIP-seq with either IgG or Input controls. P-values calculated using PANTHER overrepresentation test GO biological processes. (i) Signal track in mESCs of ChIP-seq controls over IgG DIP-seq enriched regions.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–4, Supplementary Discussion and Supplementary Table 5
Supplementary Table 1
Summary of analyzed datasets and their relationship to figures
Supplementary Table 2
Motif-enrichment analysis of 21 published 5modC DIP-seq datasets
Supplementary Table 3
Motif-enrichment analysis of 23 published 6mA DIP-seq datasets
Supplementary Table 4
Analysis of cell-culture contamination in 36 DIP-seq datasets
Rights and permissions
About this article
Cite this article
Lentini, A., Lagerwall, C., Vikingsson, S. et al. A reassessment of DNA-immunoprecipitation-based genomic profiling. Nat Methods 15, 499–504 (2018). https://doi.org/10.1038/s41592-018-0038-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-018-0038-7
This article is cited by
-
Navigating the pitfalls of mapping DNA and RNA modifications
Nature Reviews Genetics (2023)
-
NT-seq: a chemical-based sequencing method for genomic methylome profiling
Genome Biology (2022)
-
Bacterial N4-methylcytosine as an epigenetic mark in eukaryotic DNA
Nature Communications (2022)
-
High-precision mapping reveals rare N6-deoxyadenosine methylation in the mammalian genome
Cell Discovery (2022)
-
Rare and misincorporated DNA N6-methyladenine is a hallmark of cytotoxic stresses for selectively stimulating the stemness and proliferation of glioblastoma cells
Cell Discovery (2022)