London

An international team of chemists is working on something that chemistry sorely lacks — a consistent and comprehensive way of labelling all chemical compounds.

The new technique will apply computer algorithms to molecular structures, generating a unique digital signature for any chemical compound. The new labels are not intended to replace common chemical names, but to allow easier linking to compounds in online chemical databases and journals.

“The hope is that all organizations that handle information on chemicals will be able to use a single format to say what the chemical is,” says Alan McNaught, general manager of the production division at the Royal Society of Chemistry in Cambridge, who coordinates the project for the International Union of Pure and Applied Chemistry (IUPAC).

Right now there is no single international standard for identifying chemicals. The IUPAC and the American Chemical Society use different rules. Some drug companies, as well as different branches of chemistry, have their own chemical-naming systems. Even simple structures can cause confusion. For example, the formal name for acetic acid, the main ingredient in vinegar, is ethanoic acid.

IUPAC believes that its new system — which would be freely available to all — could unify the different approaches. Tentatively known as IChI, for IUPAC chemical identifier, its development is being led by a team at the US National Institute of Standards and Technology (NIST) in Gaithersburg, Maryland.

A preliminary version of the software covering well-defined, covalently bonded organic molecules was released this year to let other chemists test the idea. It labels each atom in a compound in a way that does not depend on how the structure is drawn, and converts the label to a string of characters. The format has not been finalized, but at present ethane is 'C3C3, 2-1', for example, and acetone is 'C3C3OC, 4-1-2-3' (the labels are easily converted to structures using the algorithm). The process is reversible, so molecular structures can be generated from the identifiers.

The next step is to extend the system to include more complex organic compounds, such as polymers, and ultimately to tackle inorganic compounds. By adding it to software packages commonly used to draw chemical structures, the NIST team hopes that IChI will enter into widespread use.

In effect, the IChI number will provide each chemical molecule with a digital object identifier (DOI) — a concept increasingly being applied to everything from scientific papers to individual genes. Jonathan Goodman, a chemist at the University of Cambridge, says chemistry suits this approach well. “Molecules are a wonderful unit of information to treat in this way,” he says. “They are complex enough to have lots of interesting features and difficulties but simple enough to represent quite a small subset.”

http://www.iupac.org/projects/2000/2000-025-1-800.html