Munich

In sequence: cloning complementary DNAs can expedite studies of gene function. Credit: HANK MORGAN/SPL

Hot on the heels of the publication of the draft human genome sequence, several efforts are under way to provide all scientists with free, synthetic versions of human genes. Their availability will speed up attempts to understand the function of human genes and their associated proteins.

Each of the synthetic genes, called complementary DNAs or cDNAs, can be induced to synthesize the protein encoded from the real gene from which it was derived.

A German team of researchers, half-way through a three-year pilot study, announces in this month's Genome Research (11, 422–435) that it has generated 500 previously unknown full-length human cDNAs, and an additional 1,000 known cDNAs. All the cDNAs have been inserted into transportable packages known as vector clones, and are freely available to academic and industrial scientists.

Since these results were submitted, the group, led by Stefan Wiemann of the German Cancer Research Centre in Heidelberg, has made available over a hundred additional cDNA clones. “Requests for the clones from scientists around the world have increased dramatically this year,” says Wiemann.

A typical request, he explains, might come from a cancer researcher who sees very low expression of a particular gene in one type of tumour, and wants to know what it does. If the cDNA for that gene is available as a clone, the researcher can get to work on its function straight away.

The cDNAs are synthesized from 'messenger RNAs'. Each mRNA is a transcript of a gene, and its job is to direct the stringing together of the correct amino acids that constitute the protein encoded by the gene. The enzyme reverse transcriptase is used to direct the synthesis, or reverse transcription, of cDNAs from mRNAs.

The creation of full-length cDNAs, including the entire protein-coding region as well as regulatory regions, is no mean achievement. The enzyme is inefficient in vitro — it tends to 'fall off' its substrate before the end of the operation, leaving incomplete cDNAs.

Cloning is also inefficient for long cDNA inserts, so it is easier to make short full-length cDNAs than long ones. The average size of the German cDNAs was about 2.5 kilobases (kb), equivalent to a maximum of 500 amino-acid residues.

The work to systematically produce cDNAs was pioneered by the Kazusa cDNA project in Japan, which began in the mid-1990s. This project selects only long cDNAs to synthesize and clone — they are at least 4 kb, equivalent to proteins larger than about 1,000 amino-acid residues. Of the 2,000 cDNAs produced so far, only about half are full-length. “We see large proteins as being the most interesting biologically,” says Osamu Ohara, head of the project, which is based at the Kazusa DNA Research Institute in Chiba. The Japanese cDNA clones are freely available to researchers in academia, but not to those in industry.

In the United States, the National Institutes of Health (NIH) started a large cDNA initiative last year, as part of its Cancer Genome Anatomy Project. The NIH project will provide clones freely to all researchers and, like the German project, is creating its library of cDNAs randomly, rather than through size selection. It will be larger than the German project, and has already registered 2,800 full-length sequences with GenBank — although some of these may not be unique.

The ultimate goal of all these projects is to produce a complete set of full-length human cDNAs, corresponding to all the genes in the human genome. As there are now thought to be only about 30,000 genes, and technologies needed to deal with long or rare cDNAs are improving, this may not take very long — “perhaps only a couple of years”, suggests an optimistic Robert Strausberg, head of the NIH project.

With this rapid progress, the need to coordinate these efforts has become more pressing. “It would be nice to have a system whereby different sets of cDNAs were allocated to different groups, so there would not be too much duplication of effort,” says Wiemann. Ohara agrees: “A couple of years ago we were the only group in the game, but now we really need to think of an allocation system,” he says.

There could be complications in organizing such a system, Strausberg points out, because not all groups make their results freely available to all. The issue is to be discussed in the next few months by the various projects. In the meantime, a joint website is planned to help each project keep up with what the others are doing.

http://www.rzpd.de

http://cgap.nci.nih.gov

http://www.kazusa.or.jp/huge