TruSPAdes: barcode assembly of TruSeq synthetic long reads

Bankevich, Anton; Pevzner, Pavel A

doi:10.1038/nmeth.3737

Brief Communication
Published: 01 February 2016

TruSPAdes: barcode assembly of TruSeq synthetic long reads

Anton Bankevich¹ &
Pavel A Pevzner^1,2

Nature Methods volume 13, pages 248–250 (2016)Cite this article

4440 Accesses
26 Citations
90 Altmetric
Metrics details

Subjects

Abstract

The recently introduced TruSeq synthetic long read (TSLR) technology generates long and accurate virtual reads from an assembly of barcoded pools of short reads. The TSLR method provides an attractive alternative to existing sequencing platforms that generate long but inaccurate reads. We describe the truSPAdes algorithm (http://bioinf.spbau.ru/spades) for TSLR assembly and show that it results in a dramatic improvement in the quality of metagenomics assemblies.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

High-quality metagenome assembly from long accurate reads with metaMDBG

Article Open access 02 January 2024

Scalable long read self-correction and assembly polishing with multiple sequence alignment

Article Open access 12 January 2021

Metagenome assembly of high-fidelity long reads with hifiasm-meta

Article 09 May 2022

Accession codes

Primary accessions

European Nucleotide Archive

References

Chin, C.S. et al. Nat. Methods 10, 563–569 (2013).
Article CAS Google Scholar
Lam, K.K., Khalak, A. & Tse, D. BMC Bioinformatics 15, S4 (2014).
Article Google Scholar
Koren, S. et al. Genome Biol. 14, R101 (2013).
Article Google Scholar
Huddleston, J. et al. Genome Res. 24, 688–696 (2014).
Article CAS Google Scholar
Salmela, L. & Rivals, E. Bioinformatics 30, 3506–3514 (2014).
Article CAS Google Scholar
Ummat, A. & Bashir, A. Bioinformatics 30, 3491–3498 (2014).
Article CAS Google Scholar
Lam, K.-K., LaButti, K., Khalak, A. & Tse, D. Bioinformatics 31, 3207–3209 (2015).
Article CAS Google Scholar
Berlin, K. et al. Nat. Biotechnol. 33, 623–630 (2015).
Article CAS Google Scholar
McCoy, R.C. et al. PLoS ONE 9, e106689 (2014).
Article Google Scholar
Tilgner, H. et al. Nat. Biotechnol. 33, 736–742 (2015).
Article CAS Google Scholar
Li, R. et al. Sci. Rep. 5, 10814 (2015).
Article CAS Google Scholar
Sharon, I. et al. Genome Res. 25, 534–543 (2015).
Article CAS Google Scholar
Kuleshov, V. et al. Nat. Biotechnol. 34, 64–69 (2015).
Article Google Scholar
Chitsaz, H. et al. Nat. Biotechnol. 29, 915–921 (2011).
Article CAS Google Scholar
Bankevich, A. et al. J. Comput. Biol. 19, 455–477 (2012).
Article CAS Google Scholar
Peng, Y., Leung, H.C.M., Yiu, S.M. & Chin, F.Y.L. Bioinformatics 28, 1420–1428 (2012).
Article CAS Google Scholar
Compeau, P.E., Pevzner, P.A. & Tesler, G. Nat. Biotechnol. 29, 987–991 (2011).
Article CAS Google Scholar
Kuleshov, V. et al. Nat. Biotechnol. 32, 261–266 (2014).
Article CAS Google Scholar
Simpson, J.T. & Durbin, R. Genome Res. 22, 549–556 (2012).
Article CAS Google Scholar
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. Bioinformatics 29, 1072–1075 (2013).
Article CAS Google Scholar
Peng, Y., Leung, H.C.M., Yiu, S.M. & Chin, F.Y.L. Bioinformatics 27, i94–i101 (2011).
Article CAS Google Scholar
Boisvert, S., Raymond, F., Godzaridis, E., Laviolette, F. & Corbeil, J. Genome Biol. 13, R122 (2012).
Article Google Scholar
Haider, B. et al. Bioinformatics 30, 2717–2722 (2014).
Article CAS Google Scholar
Howe, A.C. et al. Proc. Natl. Acad. Sci. USA 111, 4904–4909 (2014).
Article CAS Google Scholar
Marcy, Y. et al. Proc. Natl. Acad. Sci. USA 104, 11889–11894 (2007).
Article CAS Google Scholar
McLean, J.S. et al. Genome Res. 23, 867–877 (2013).
Article CAS Google Scholar
Nurk, S. et al. J. Comput. Biol. 20, 714–737 (2013).
Article CAS Google Scholar
Myers, E.W. et al. Science 287, 2196–2204 (2000).
Article CAS Google Scholar
Treangen, T.J. et al. Genome Biol. 14, R2 (2013).
Article Google Scholar
Peters, B.A., Liu, J. & Drmanac, R. Front. Genet. 5, 466 (2015).
Article Google Scholar
Dean, F.B., Nelson, J.R., Giesler, T.L. & Lasken, R.S. Genome Res. 11, 1095–1099 (2001).
Article CAS Google Scholar
Lasken, R.-S. & Stockwell, T.B. BMC Biotechnol. 7, 19 (2007).
Article Google Scholar
Zerbino, D.-R. & Birney, E. Genome Res. 18, 821–829 (2008).
Article CAS Google Scholar
Simpson, J.T. et al. Genome Res. 19, 1117–1123 (2009).
Article CAS Google Scholar
Prjibelski, A. et al. Bioinformatics 30, 293–301 (2014).
Article Google Scholar
Zimin, A.V., Smith, D.R., Sutton, G. & Yorke, J.A. Bioinformatics 24, 42–45 (2008).
Article CAS Google Scholar
Vasilinetc, I., Prjibelski, A.D., Gurevich, A., Korobeynikov, A. & Pevzner, P.A. Bioinformatics 30, 293–301 (2015).
Google Scholar
Antipov, D., Korobeynikov, A., McLean, J.S. & Pevzner, P.A. Bioinformatics doi:10.1093/bioinformatics/btv688 (2015).
Ashton, P.M. et al. Nat. Biotechnol. 33, 296–300 (2015).
Article CAS Google Scholar

Download references

Acknowledgements

We are indebted to V. Montel, J. Stuzka and O. Schulz-Trieglaff at Illumina for many helpful discussions, sample preparation and TSLR data. We thank J. Banfiled and I. Sharon for providing their metagenomics TSLR data. This study was supported by the Russian Science Foundation (grant 14-50-00069 to A.B. and P.A.P.).

Author information

Authors and Affiliations

Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
Anton Bankevich & Pavel A Pevzner
Department of Computer Science and Engineering, University of California at San Diego, La Jolla, California, USA
Pavel A Pevzner

Authors

Anton Bankevich
View author publications
You can also search for this author in PubMed Google Scholar
Pavel A Pevzner
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.B. developed and implemented the truSPAdes algorithm and performed benchmarking. A.B. and P.A.P. conceived the study, designed the computational experiments and wrote the manuscript.

Corresponding author

Correspondence to Anton Bankevich.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 k-mer coverage histograms.

Histograms of k-mer coverage (k = 55) for the E. coli standard isolate dataset from Bankevich et al.¹⁶ (a), the E. Coli MDA-amplified single cell dataset from Bankevich et al.¹⁶ (b), one of the barcodes of TSLR data (c) and a single 10 Kb long fragment of a barcode (d). Conventional assemblers select a coverage threshold to separate correct from erroneous k-mers. The histogram for data from the standard isolate features a smaller peak on the left (formed by largely erroneous k-mers with low coverage) and a larger peak on the right (formed by largely correct k-mers with high coverage). Thus, one can choose a proper threshold that separates correct from false k-mers³⁸. However, for both MDA and TSLR, there is no threshold separating correct and false k-mers.

Supplementary Figure 2 Barcode span.

Construction of the barcode span: red regions have rather uniform read coverage and length close to 10 Kb. Black reads do not belong to the selected barcode spans represent read mapping artifacts and are ignored.

Supplementary Figure 3 Typical misassemblies.

Two common types of misassemblies: false (a,b,c) and chimeric (d,e,f) connections. (a) Two unrelated instances of the blue repeat are located in red (left) and yellow (right) genome fragments. These instances are flanked by short dotted segments (b). These short dotted segments correspond to short dotted edges (tips) in the de Bruijn graph. (c) Tip trimming results in a single (misassembled) edge in the de Bruijn graph representing a false connection. (d) A region of the genome formed by consecutive yellow and green segments (e) Since the yellow fragment has been erroneously amplified from the opposite strand, the reverse complementary copy is added to the end of this region resulting in a chimeric fragment (f). In the de Bruijn graph, the corresponding yellow solid edge has two outgoing edges: one for each connection between the yellow and green parts of the genome fragment. One of these connections represents an erroneous chimeric connection (transition from solid yellow to dashed green). We note that our explanation for the experimental cause of the chimeric connection is just a hypothesis that accurately reflects the computational artifacts we observe.

Supplementary Figure 4 Iterative assembly.

A fragment of a genome along with four reads (1st panel) and de Bruijn graphs of these reads constructed for k = 3 (2nd panel), k = 4 (3rd panel), and k = 5 (4th panel). The parameter k = 4 represents the “sweet spot” in the iterative assembly since the de Bruijn graph for k = 3 is over-tangled while the de Bruijn graph for k = 5 is over-fragmented.

Supplementary Figure 5 TruSPAdes pseudocode.

Outline of truSPAdes pipeline. TruSPAdes specific modifications are highlighted in blue.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5 (PDF 598 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bankevich, A., Pevzner, P. TruSPAdes: barcode assembly of TruSeq synthetic long reads. Nat Methods 13, 248–250 (2016). https://doi.org/10.1038/nmeth.3737

Download citation

Received: 06 August 2015
Accepted: 08 December 2015
Published: 01 February 2016
Issue Date: March 2016
DOI: https://doi.org/10.1038/nmeth.3737

This article is cited by

SLR-superscaffolder: a de novo scaffolding tool for synthetic long reads using a top-to-bottom scheme
- Lidong Guo
- Mengyang Xu
- Xin Liu
BMC Bioinformatics (2021)
Optimizing sequencing protocols for leaderboard metagenomics by combining long and short reads
- Jon G. Sanders
- Sergey Nurk
- Rob Knight
Genome Biology (2019)
High-quality genome sequences of uncultured microbes by assembly of read clouds
- Alex Bishara
- Eli L Moss
- Ami S Bhatt
Nature Biotechnology (2018)
Comparison of carnivore, omnivore, and herbivore mammalian genomes with a new leopard assembly
- Soonok Kim
- Yun Sung Cho
- Joo-Hong Yeo
Genome Biology (2016)