Credit: Michael Stratton (left) and Peter Campbell.

Since the Human Genome Project began 20 years ago huge progress has been made to make sequencing technologies both cheaper and quicker. But an analysis of the sequences of two cancer genomes shows the obstacles that remain to be overcome to make these technologies even more useful.

The time and money saved by next-generation sequencing technologies — compared with the first-generation approaches used for the Human Genome Project — make it possible to sequence the genome of an individual's cancer cells to identify all the mutations contributing to the cancer. But a drawback of the new technologies is that DNA sequences are obtained in snippets of only about 100 base pairs at a time, as opposed to the 500 or more that the first-generation methods can read in one go. So when a group of researchers from nine institutions set out to sequence the genome of cancer cells from a patient with smoking-related lung cancer (see page 184) and from a patient with the skin cancer melanoma, related to ultraviolet-light exposure (see page 191), they had more puzzle pieces to put together to get the entire genome sequence.

The results illustrate the power of a cancer genome sequence as an archaeological excavation.

As a result, they had to sequence each genome 30 times — rather than the 10 times that yield a high-quality sequence using first-generation technology — to make sure they had enough overlap among the pieces to 'map' their locations on chromosomes. “The smaller the fragment you have, the more ambiguity you have and the more difficult it is to map,” says Michael Stratton, a genomicist at the Wellcome Trust Sanger Institute in Hinxton, UK, who led the sequencing efforts and focused on analysing the melanoma genome. The group persevered by repeatedly tweaking the bioinformatics algorithms that picked or 'called' each piece in the puzzle, says Stratton, analysing about 200 billion data points in total.

However, getting the sequence data was only the first step. The authors then had to identify mutations that contributed to each cancer. To do so, they sequenced genomes of healthy cells from each individual and compared these with those of tumour cells to pin down the differences. Next, they identified all the mutations that were inherited but that probably did not contribute to the cancer, to find the final set that was likely to be involved in cancer. That, too, was complicated, because they had to look at several kinds of mutation, and identification of each type relied on a separate algorithm. Moreover, says Stratton, “each of those has its own issues in terms of calling the mutations successfully”.

But the effort was worth it. The team identified more than 33,000 mutations in the melanoma and more than 23,000 in the lung cancer. The complete set reveals a timeline of how the DNA changed, with 'letters' dropping out, being added or being transposed — from long before cancer would have developed to a point associated with symptoms. Most surprisingly, says Stratton, is that the data provided evidence of the cells' DNA-repair mechanisms fixing some of these mutations. “The results illustrate the power of a cancer-genome sequence as an archaeological excavation, revealing traces of the DNA damage, repair, mutation and selection processes that were operative years before the cancer became symptomatic,” he says. “We saw things in those two cancers that we'd never seen before: imprints of the DNA-repair machinery.”

Although sequencing individual cancer genomes is still too expensive and time-consuming to be done routinely, says Peter Campbell, the Sanger genomicist who led the lung-cancer sequencing project, the effort is worthwhile because it will eventually improve cancer diagnosis by allowing clinicians to see mutations that lead to cancer before the disease takes hold. It will also allow them to monitor treatment progress by looking for signs that cancer-causing mutations have stopped occurring. “Defining recurrent mutations gives us leads into diagnostics and treatments as well as the underlying molecular mechanisms,” Campbell explains.

He also believes that sequencing technologies will continue to improve. When next-generation sequencing emerged, sequencing machines could identify only 25 base pairs at a time. That has risen to 100 and, Campbell says, will continue to increase as new machines appear. As this generation of sequencing improves and new types, such as single-molecule sequencing, come on board, the cost will go down and accuracy will go up. “There's a real possibility that single-molecule sequencing will become a reality. That will be a quantum leap in the amount of sequence we can generate,” he says.