RNA sequencing (RNA-seq) is actually a misnomer for the increasingly popular technique to determine the sequence of transcripts. It is not the RNA that is being sequenced but its reverse-transcribed cDNA derivative.

Presently available second-generation sequencing platforms all require many copies of the molecule that is to be sequenced and thus include an amplification step in their protocols. As RNA cannot be amplified, the detour via cDNA is necessary, and though the protocols for cDNA generation are well worked out, they are not immune to errors and bias, which can make data interpretation difficult.

Helicos BioSciences has recently introduced a third-generation, single-molecule DNA sequencer, the HeliScope, and now a team led by Fatih Ozsolak and Patrice Milos at the company have adapted the protocol to allow direct RNA sequencing, thus avoiding the cDNA detour and allowing a straight look at the transcriptome.

It is not a given that what works with robust DNA will also work with more fickle RNA. Whereas the scientists did not have to change the principle of Helicos' sequencing by synthesis, Milos says that the main challenge was in modifying all the components of the system—buffer, polymerase and nucleotide chemistry—so that they would work in the context of RNA; the exact nature of these modifications is not being disclosed.

The team started with synthetic 40-mer RNA oligoribonucleotides that they poly(A)-tagged to capture them on the poly(T) surface of the sequencer's flowcell. Their prototype flowcell was small, allowing only thousands of reads, as opposed to the 600–800 million reads in the HeliScope, but the average read number per area on the flowcell was very similar, indicating that the prototype can be scaled up. The average read length was around 20 nucleotides, with an error rate of approximately 4%.

Moving to a biological sample the researchers then sequenced poly(A)-containing RNA from yeast starting with 2 nanograms of material, about 100-fold less than other next-generation sequencing platforms require for RNA-seq. A three-day run yielded just over 41,000 reads of which 48% aligned to the yeast genome. Milos says that the team is now working on scaling the prototype methods up for the HeliScope.

Higher sequencing depth will be beneficial for error correction and quantitative transcript analysis, but another challenge, especially for the discovery of new transcripts and isoforms, is the short read length. Other next-generation sequencing platforms have used paired-end reads, short reads from either end of a longer molecule, to improve isoform discovery. Scientists at Helicos are currently developing a similar but distinct approach for the single-molecule sequencer.

Their strategy involves the capture of long molecules in the flowcell followed by sequencing of the initial 30 nucleotides. Then they turn off the laser, which captures the signal of the incorporated fluorophore-labeled nucleotides, and add unlabeled nucleotides to extend the strand for a defined length, after which they turn the laser back on and sequence the next 30 bases. The end result is intermittent sequence information on a long molecule that will make its characterization much easier than if it has to be assembled from short reads. Milos predicts that this strategy will, for example, be invaluable for finding long intergenic noncoding RNAs, transcripts that span the interval between exonic regions, and she adds: “I think people are very interested to learn if these are indeed true cellular RNAs.”

This is just one application for single RNA molecule sequencing; it is likely that in 2010, when Helicos will make this technology available to customers, many more will become apparent.