CRISPR–Cas encoding of a digital movie into the genomes of a population of living bacteria

Shipman, Seth L.; Nivala, Jeff; Macklis, Jeffrey D.; Church, George M.

doi:10.1038/nature23017

Letter
Published: 12 July 2017

CRISPR–Cas encoding of a digital movie into the genomes of a population of living bacteria

Seth L. Shipman^1,2,3,
Jeff Nivala^1,3,
Jeffrey D. Macklis² &
…
George M. Church^1,3

Nature volume 547, pages 345–349 (2017)Cite this article

111k Accesses
219 Citations
3281 Altmetric
Metrics details

Subjects

Abstract

DNA is an excellent medium for archiving data. Recent efforts have illustrated the potential for information storage in DNA using synthesized oligonucleotides assembled in vitro^1,2,3,4,5,6. A relatively unexplored avenue of information storage in DNA is the ability to write information into the genome of a living cell by the addition of nucleotides over time. Using the Cas1–Cas2 integrase, the CRISPR–Cas microbial immune system stores the nucleotide content of invading viruses to confer adaptive immunity⁷. When harnessed, this system has the potential to write arbitrary information into the genome⁸. Here we use the CRISPR–Cas system to encode the pixel values of black and white images and a short movie into the genomes of a population of living bacteria. In doing so, we push the technical limits of this information storage system and optimize strategies to minimize those limitations. We also uncover underlying principles of the CRISPR–Cas adaptation system, including sequence determinants of spacer acquisition that are relevant for understanding both the basic biology of bacterial adaptation and its technological applications. This work demonstrates that this system can capture and stably store practical amounts of real data within the genomes of populations of living cells.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 2: Sequence determinants of acquisition.**

**Figure 3: Encoding a GIF in bacteria.**

A mixed culture of bacterial cells enables an economic DNA storage on a large scale

Article Open access 31 July 2020

Min Hao, Hongyan Qiao, … Hao Qi

Dynamic and scalable DNA-based information storage

Article Open access 12 June 2020

Kevin N. Lin, Kevin Volkel, … Albert J. Keung

Real-time observation of CRISPR spacer acquisition by Cas1–Cas2 integrase

Article 04 May 2020

Jagat B. Budhathoki, Yibei Xiao, … Ailong Ke

References

Church, G. M., Gao, Y. & Kosuri, S. Next-generation digital information storage in DNA. Science 337, 1628 (2012)
Article ADS CAS Google Scholar
Goldman, N. et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013)
Article ADS CAS Google Scholar
Gibson, D. G. et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329, 52–56 (2010)
Article ADS CAS Google Scholar
Clelland, C. T., Risca, V. & Bancroft, C. Hiding messages in DNA microdots. Nature 399, 533–534 (1999)
Article ADS CAS Google Scholar
Adleman, L. M. Molecular computation of solutions to combinatorial problems. Science 266, 1021–1024 (1994)
Article ADS CAS Google Scholar
Davis, J. Microvenus. Art J. 55, 70–74 (1996)
Article Google Scholar
Barrangou, R. et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science 315, 1709–1712 (2007)
Article ADS CAS Google Scholar
Shipman, S. L ., Nivala, J ., Macklis, J. D. & Church, G. M. Molecular recordings by directed CRISPR spacer acquisition. Science 353, aaf1175 (2016)
Article Google Scholar
Amitai, G. & Sorek, R. CRISPR–Cas adaptation: insights into the mechanism of action. Nat. Rev. Microbiol. 14, 67–76 (2016)
Article CAS Google Scholar
Sternberg, S. H., Richter, H., Charpentier, E. & Qimron, U. Adaptation in CRISPR–Cas Systems. Mol. Cell 61, 797–808 (2016)
Article CAS Google Scholar
van der Oost, J., Jore, M. M., Westra, E. R., Lundgren, M. & Brouns, S. J. CRISPR-based adaptive and heritable immunity in prokaryotes. Trends Biochem. Sci. 34, 401–407 (2009)
Article CAS Google Scholar
Deveau, H. et al. Phage response to CRISPR-encoded resistance in Streptococcus thermophilus. J. Bacteriol. 190, 1390–1400 (2008)
Article CAS Google Scholar
Paez-Espino, D. et al. Strong bias in the bacterial CRISPR elements that confer immunity to phage. Nat. Commun. 4, 1430 (2013)
Article ADS Google Scholar
Westra, E. R. et al. Type I-E CRISPR–Cas systems discriminate target from non-target DNA through base pairing-independent PAM recognition. PLoS Genet. 9, e1003742 (2013)
Article CAS Google Scholar
Shmakov, S. et al. Pervasive generation of oppositely oriented spacers during CRISPR adaptation. Nucleic Acids Res. 42, 5907–5916 (2014)
Article CAS Google Scholar
Nuñez, J. K., Harrington, L. B., Kranzusch, P. J., Engelman, A. N. & Doudna, J. A. Foreign DNA capture during CRISPR–Cas adaptive immunity. Nature 527, 535–538 (2015)
Article ADS Google Scholar
Wang, J. et al. Structural and mechanistic basis of PAM-dependent spacer acquisition in CRISPR–Cas systems. Cell 163, 840–853 (2015)
Article CAS Google Scholar
Yosef, I., Goren, M. G. & Qimron, U. Proteins and DNA elements essential for the CRISPR adaptation process in Escherichia coli. Nucleic Acids Res. 40, 5569–5576 (2012)
Article CAS Google Scholar
Diez-Villasenor, C., Almendros, C., Garcia-Martinez, J. & Mojica, F. J. Diversity of CRISPR loci in Escherichia coli. Microbiology 156, 1351–1361 (2010)
Article CAS Google Scholar
Weinberger, A. D. et al. Persisting viral sequences shape microbial CRISPR-based immunity. PLOS Comput. Biol. 8, e1002475 (2012)
Article CAS Google Scholar
Held, N. L., Herrera, A., Cadillo-Quiroz, H. & Whitaker, R. J. CRISPR associated diversity within a population of Sulfolobus islandicus. PLoS One 5, e12988 (2010)
Article ADS Google Scholar
Yosef, I. et al. DNA motifs determining the efficiency of adaptation into the Escherichia coli CRISPR array. Proc. Natl Acad. Sci. USA 110, 14396–14401 (2013)
Article ADS CAS Google Scholar
Westra, E. R. et al. CRISPR immunity relies on the consecutive binding and degradation of negatively supercoiled invader DNA by cascade and Cas3. Mol. Cell 46, 595–605 (2012)
Article CAS Google Scholar
Semenova, E. et al. Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence. Proc. Natl Acad. Sci. USA 108, 10098–10103 (2011)
Article ADS CAS Google Scholar
Zhirnov, V., Zadegan, R. M., Sandhu, G. S., Church, G. M. & Hughes, W. L. Nucleic acid memory. Nat. Mater. 15, 366–370 (2016)
Article ADS CAS Google Scholar
Hsiao, V., Hori, Y., Rothemund, P. W. & Murray, R. M. A population-based temporal logic gate for timing and recording chemical events. Mol. Syst. Biol. 12, 869 (2016)
Article Google Scholar
McKenna, A . et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016)
Article Google Scholar
Frieda, K. L. et al. Synthetic recording and in situ readout of lineage information in single cells. Nature 541, 107–111 (2017)
Article ADS CAS Google Scholar
Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017)
Article ADS CAS Google Scholar
O’Shea, J. P. et al. pLogo: a probabilistic approach to visualizing sequence motifs. Nat. Methods 10, 1211–1212 (2013)
Article Google Scholar

Download references

Acknowledgements

S.L.S. is a Shurl and Kay Curci Foundation Fellow of the Life Sciences Research Foundation. The project was supported by grants from the National Institute of Mental Health (5R01MH103910), National Human Genome Research Institute (5RM1HG008525), and Simons Foundation Autism Research Initiative (368485) to G.M.C., the National Institute of Neurological Disorders and Stroke (5R01NS045523) to J.D.M and an Allen Distinguished Investigator Award from the Paul G. Allen Frontiers Group to J.D.M. We thank G. Kuznetsov for comments on the manuscript.

Author information

Authors and Affiliations

Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, 02115, Massachusetts, USA
Seth L. Shipman, Jeff Nivala & George M. Church
Department of Stem Cell and Regenerative Biology, Center for Brain Science, and Harvard Stem Cell Institute, Harvard University, Bauer Laboratory 103, Cambridge, 02138, Massachusetts, USA
Seth L. Shipman & Jeffrey D. Macklis
Wyss Institute for Biologically Inspired Engineering, Harvard University, Cambridge, 02138, Massachusetts, USA
Seth L. Shipman, Jeff Nivala & George M. Church

Authors

Seth L. Shipman
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Nivala
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey D. Macklis
View author publications
You can also search for this author in PubMed Google Scholar
George M. Church
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.L.S. and J.N. conceived the study. S.L.S. designed the work, performed experiments, analysed data, wrote custom Python analysis software, and wrote the manuscript with input from J.N., J.D.M. and G.M.C. S.L.S. J.N., J.D.M. and G.M.C. discussed results and commented on the manuscript.

Corresponding author

Correspondence to George M. Church.

Ethics declarations

Competing interests

S.L.S. J.N, J.D.M., and G.M.C. are inventors on a provisional patent (62/296,812) filed by the President and Fellows of Harvard College that covers the work in this manuscript. A complete accounting of the financial interests of G.M.C. is listed at: http://arep.med.harvard.edu/gmc/tech.html.

Additional information

Reviewer Information Nature thanks R. Barrangou and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Figure 1 Recording images into the genome.

a, Pixel values are encoded across many protospacers, which are electroporated into a population of bacteria that overexpress Cas1 and Cas2 to store the image data. These bacteria can be archived, propagated, and eventually sequenced to recall the image. b, Initial image to be encoded. c, Nucleotide-to-colour encoding scheme. d, Example of the encoding scheme. Sequence at top shows the protospacer linear view with pixet code (specifying a pixel set) followed by pixel values, which are distributed across the image. Pixet number is shown under the pixet nucleotides, with the binary-converted pixet and binary-to-nucleotide conversion reference below that. Small numbers (in colour) below the protospacer indicate individual pixels, identified by boxes on the image. Protospacer in minimal hairpin format for electroporation is shown on the right. e, Results of one replicate at a depth of 655,360 reads. White is shown if no information was recovered about the pixel value (owing to a pixet protospacer not being recovered after sequencing). f, Percentage of accurately recalled pixets as a function of read depth. Unfilled circles indicate points derived from 3 biological replicates. The black line is the mean of the replicates. g, Examples of the images that result from down-sampling the sequencing reads. h, Effect of supplying fewer oligonucleotides on recall accuracy as a function of reads sampled when smaller pools of oligonucleotides are supplied and recalled. Individual points show 3 biological replicates, lines are the means of the replicates. i, Number of reads required to reach 50%, 60%, 70%, and 80% accuracy on a given oligonucleotide set as a function of oligonucleotides supplied (n = 3; linear regression of the 80% curve, R²=0.9466; runs test of the 80% curve, P > 0.99). Additional statistical details in Supplementary Table 2.

Extended Data Figure 2 Testing a minimal hairpin protospacer.

a, Percentage of arrays expanded with oligonucleotide-supplied spacers following electroporation of the sequences indicated below, aimed at testing PAM inclusion on both the top and bottom strands. Unfilled circles indicate biological replicates, bars are mean ± s.e.m (n = 3; one-way ANOVA: P < 0.0001; follow-up Dunnett’s multiple comparison (corrected), no PAM versus full PAM: P = 0.0001, no PAM versus bottom PAM: P = 0.0002). *P < 0.05. Oligonucleotides supplied at 3.125 μM each. b, Percentage of arrays expanded with oligonucleotide-supplied spacers following electroporation of the sequences indicated to the left, right, and below aimed at finding a minimal functional hairpin protospacer. Unfilled circles indicate individual biological replicates, bars are mean ± s.e.m (n = 4; one-way ANOVA effect of protospacer: P > 0.05). Oligonucleotides supplied at 3.125 μM. c, Percentage of arrays expanded following electroporation of different concentrations of the minimal hairpin oligonucleotide protospacer (n = 1). Additional statistical details in Supplementary Table 2.

Extended Data Figure 3 Cell surviving electroporation.

Colony-forming units per millilitre of starting culture before beginning electroporation, after pre-electroporation washes, immediately post-electroporation, and after 1 h of recovery. Cells in red were electroporated with a minimal hairpin oligonucleotide, those in blue were electroporated in water alone. Unfilled circles represent individual biological replicates (n = 3), filled circles are mean ± s.e.m.

Extended Data Figure 4 Optimization of protospacer sequence parameters.

a, Comparison of the percentage of arrays that were expanded after encoding hand^R and hand^F images (n = 3). b, Percentage of arrays expanded per oligonucleotide (single pool) or per subpool (subpooled) across a range of GC percentages. Unfilled black circles to the left represent individual oligonucleotide protospacer sequences (three biological replicates each), while black line shows mean ± s.e.m. Unfilled red circles to the right represent individual biological replicates. Bars are mean ± s.e.m (n = 3; one-way ANOVA on effect of GC percentage, single pool: P < 0.0001, subpooled P = 0.0011; follow-up testing with Tukey’s multiple comparison (corrected), see Supplementary Table 2). c, Percentage of arrays expanded per oligonucleotide electroporated individually across a range of GC percentages. Unfilled red circles are individual biological replicates. Bars show mean ± s.e.m (n = 3; one-way ANOVA on effect of GC percentage: P = 0.0001; follow-up testing with Tukey’s multiple comparison (corrected), see Supplementary Table 2). d, Gibbs free energy of minimal hairpin protospacers structures for each of the images, with protospacers ranked by overall acquisition frequency (n = 3; linear regression, hand^R: P = 0.0089, hand^F: P = 0.0004). e, Percentage of arrays expanded per oligonucleotide (single pool) or per subpool (subpooled) with different numbers of mononucleotide repeats (n = 3; one-way ANOVA on effect of mononucleotide repeats, single pool: P = 0.3843, subpooled: P = 0.0015; follow-up testing with Tukey’s multiple comparison (corrected), see Supplementary Table 2). Panel attributes as in b. f, Percentage of arrays expanded per oligonucleotide (single pool) or per subpool (subpooled) with different numbers of internal PAMs (n = 3; one-way ANOVA on effect of internal PAMs, single pool: P = 0.0565, subpooled: P = 0.0052; follow-up testing with Tukey’s multiple comparison (corrected), see Supplementary Table 2). Panel attributes as in b. *P < 0.05. Additional statistical details in Supplementary Table 2.

Extended Data Figure 5 Effect of the 3′ motif on protospacer acquisition when supplied as two complementary oligonucleotides.

Individual sequences designed to directly test the motif identified in Fig. 2b shown to the left. To the right, percentage of arrays expanded following electroporation of the sequences indicated as two complementary oligonucleotides (in dark red), rather than a minimal oligonucleotide hairpin (shown for comparison in pink). Unfilled circles indicate individual biological replicates. Bars show mean ± s.e.m. (n = 3; one-way ANOVA on effect of oligonucleotide: P = 0.0041; follow-up testing with Sidak’s multiple comparison (corrected), seq^over versus seq^over-CCT: P = 0.0103, seq^under versus seq^under-TGA: P = 0.0081). *P < 0.05. Additional statistical details in Supplementary Table 2.

Extended Data Figure 6 Recall of frame order over time based on position in the CRISPR array.

a, Initial set of rules to test the order of spacers within a pixet. Every time two spacers from the same pixet are found in a single array, their relative physical location (with respect to the leader) is extracted. As is the location of each spacer relative to spacers drawn from the genome or plasmid (G/P). The actual sequence of electroporated protospacers should occupy arrays in a predictable physical arrangement, as described by these ordering rules. Every possible permutation of spacers within a pixet is tested against each of these rules and, if a permutation satisfies all the rules, spacers are assigned to frame. b, Second set of tests to compare between pixets. If no permutation satisfies all of the tests in a, spacers are compared to previously assigned spacers from other pixets pairwise when found in the same array. A larger set of rules will hold true for the actual sequence of electroporated protospacers when compared against previously assigned spacers. Again, all possible order permutations are tested, and order is assigned based on the best overall satisfaction of these ordering rules.

Extended Data Figure 7 Quantification of errors by source.

Includes any instance of a called spacer that does not match the supplied protospacer.

Extended Data Figure 8 Methods of image encoding for error-correction.

a–d, Method used in Fig. 1. a, Triplet code to flexibly specify 21 colours. b, Example of a pixet to be encoded into nucleotide space with pixel values marked. c, Rules specifying how the protospacer will be built. d, Example of the build of the protospacer. The AAG introduced by the addition of pixel 4 is unacceptable and invokes the flexible switch to another triplet. In a test of the extendibility of this encoding scheme, we ran three random sets of 100 million different nine-colour orderings through the sequence build and found that 99.86 ± 0.07% of colour orders were able to satisfy the requirements we set out without optimization by hand. e–i, Method of alternating clusters for error correction. e, Triplet assignment to clusters A, B, and X. f, Example of a pixet to be encoded into nucleotide space with pixel values marked. g, Rules for adding new triplets in this scheme. h, Example of the build of the protospacer. The AAG introduced by the addition of pixel 4 is unacceptable and invokes the flexible switch to cluster X. i, Example of an error signal. j–l, Method of checksum error correction. j, Annotation of protospacer with the addition of a checksum. k, Annotation of the checksum itself. l, Full protospacer with checksum implemented.

Supplementary information

Supplementary Information

This file contains Supplementary Notes and Supplementary Tables 1-2. (PDF 324 kb)

Supplementary Data

This file contains Supplementary Table 3. (XLSX 73 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shipman, S., Nivala, J., Macklis, J. et al. CRISPR–Cas encoding of a digital movie into the genomes of a population of living bacteria. Nature 547, 345–349 (2017). https://doi.org/10.1038/nature23017

Download citation

Received: 22 August 2016
Accepted: 02 June 2017
Published: 12 July 2017
Issue Date: 20 July 2017
DOI: https://doi.org/10.1038/nature23017

This article is cited by

A biological camera that captures and stores images directly into DNA
- Cheng Kai Lim
- Jing Wui Yeoh
- Chueh Loo Poh
Nature Communications (2023)
A temporally resolved DNA framework state machine in living cells
- Yan Zhao
- Shuting Cao
- Chunhai Fan
Nature Machine Intelligence (2023)
Genome expansion by a CRISPR trimmer-integrase
- Joy Y. Wang
- Owen T. Tuck
- Jennifer A. Doudna
Nature (2023)
Digital data storage on DNA tape using CRISPR base editors
- Afsaneh Sadremomtaz
- Robert F. Glass
- Reza Zadegan
Nature Communications (2023)
Histones direct site-specific CRISPR spacer acquisition in model archaeon
- Elizabeth A. Watts
- Sandra C. Garrett
- Michael P. Terns
Nature Microbiology (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.