Shapes of the future: the proposed database may uncover new uses even for familiar aspirin. Credit: ALFRED PASIEKA/SPL

A powerful open-access database that would link proteins to their role in cells and pinpoint their effects on whole organisms is being planned by the US National Cancer Institute (NCI).

Under the plans, the NCI would ask scientists around the world to deposit data on the effects of small chemical molecules on proteins, cell pathways and tissue formation. An additional important aspect would be the compilation of data on how such molecules affect an organism's phenotype — the observable characteristics that result from the expression of its genes.

The database has been dubbed 'ChemBank' by some of its supporters, who see it as a chemical version of GenBank, the online repository for genetic data.

A search of ChemBank for all known effects of a given molecule could reveal connections between previously unrelated biological functions. If a particular molecule that is known to bind a protein, for example, is also found to disrupt a metabolic pathway, this could show that the protein is directly involved in that pathway. The repository would effectively hold all the available data in the emerging field of 'chemical genetics' (see Nature 407, 282–284; 2000).

Plans for the database are at an early stage. But the NCI will take the first steps towards it later this year by establishing dedicated 'molecular targeting laboratories'. These labs will synthesize tens of thousands of small molecules and screen them for their biological effects. The information produced will be added to existing similar types of data to form the backbone of the database.

“This struck me as something that we really needed,” says NCI director Richard Klausner. “If we're ever going to have an annotated database of molecular probes — chemical entities that recognize and may or may not do things to gene products — then it's going to require a pretty systematic effort to characterize and develop those probes.”

Although Klausner believes that such probes will ultimately be used in cancer treatment, he insists that the new scheme is “not a drug-discovery programme”. Rather, it aims to compile a complete catalogue of the effects of molecular probes on gene products, pathways and phenotypes, he says.

The NCI is currently considering proposals from potential hosts and will announce its support support for one or more molecular targeting labs within a few months. Funding for the first year will be around $10 million, and that will subsequently increase, says Klausner.

One candidate to host such a laboratory is Harvard University's Institute of Chemistry and Cell Biology. “It's a big, big undertaking and pretty speculative at the moment,” says Stuart Schreiber, the institute's co-director. “But it could help us realize the full potential of chemical genetics.” Existing collections of data on small molecules generally consist of information on protein binding and rarely include phenotypic effects, he notes. “Binding data by themselves don't illuminate biology.”

Craig Crews, a cell biologist who works in chemical genetics at Yale University, says the database would be “extremely valuable”. As well as revealing new aspects of cell biology, he adds, it will streamline efforts to relate research on new molecules to existing data. “Right now you have to be something of a detective to make those links,” Crews says.