Putting it all together

Gunter, Chris

doi:10.1038/nrg2262

Download PDF

Milestones
Published: 15 October 2007

Milestone 20

Putting it all together

DNA assembly programs

Chris Gunter¹

Nature Reviews Genetics volume 8, page S18 (2007)Cite this article

126 Accesses
Metrics details

You might remember this problem from your childhood: when you lose the top to your puzzle box, you are confronted with lots of pieces and no idea what they are supposed to look like when assembled. Genome sequencers faced the same dilemma when beginning large-scale DNA sequencing. They did the same thing that you might: they started at known landmarks and systematically built up the larger picture.

In order to assemble short stretches of DNA sequence from each read into a larger whole, particularly on a large scale, bioinformaticists developed algorithms that could take input directly from fluorescent sequencing machines. The earliest programs to achieve wide use were called Phred, Phrap and Consed, developed by Phil Green and colleagues. Phred initially went through the sequence reads and assigned a 'base call' to the chromatogram output from the machine. Phrap then assembled the list of bases from multiple reads into the most likely single path through the sequence. Users then viewed and edited the output with Consed, to generate higher-quality sequences as required. These programs were developed for, and used on, the public Human Genome Project.

Gene Myers and colleagues later developed an algorithm that used the end-pair information from sequencing subclones and could assemble larger sequences. They postulated that the whole genome could be cut into pieces, sequenced randomly and reconstructed given sufficient computational power. They demonstrated this approach on the genome of Drosophila melanogaster and famously went on to 'race' the publicly-funded Human Genome Project using, in the end, a combination of their whole-genome assembly methods and data from the public project. However, so-called shotgun whole-genome assemblies are now the method of choice for large genome projects, and the field has moved on to next-generation programs like Arachne, Atlas and PCAP, each using different algorithms.

References

ORIGINAL RESEARCH PAPERS

Ewing, B. & Green, P. Basecalling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 8, 186–194 (1998)
Article CAS Google Scholar
Ewing, B., Hillier, L., Wendl, M. & Green, P. Basecalling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998)
Article CAS Google Scholar
Gordon, D., Abajian, C. & Green, P. Consed: a graphical tool for sequence finishing. Genome Res. 8, 195–202 (1998)
Article CAS Google Scholar
Myers, E. W. et al. A whole-genome assembly of Drosophila. Science 287, 868–877 (2000)
Google Scholar

WEB SITES

National Center for Biotechnology Information assembly information: http://www.ncbi.nlm.nih.gov/genome/guide/Assembly/Assembly.shtml
Phrap: http://www.phrap.org

Download references

Author information

Authors and Affiliations

Senior Editor, Nature,
Chris Gunter

Authors

Chris Gunter
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gunter, C. Putting it all together. Nat Rev Genet 8 (Suppl 1), S18 (2007). https://doi.org/10.1038/nrg2262

Download citation

Published: 15 October 2007
Issue Date: October 2007
DOI: https://doi.org/10.1038/nrg2262

Putting it all together

References

ORIGINAL RESEARCH PAPERS

FURTHER READING

WEB SITES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Search

Quick links

References

ORIGINAL RESEARCH PAPERS

FURTHER READING

WEB SITES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links