Amanda C Schierz, Larisa N Soldatova & Ross D King respond:

The first point we would like to make is that we are very pleased to learn of the project to clean up the data in the PDB and to see on the website (http://www.wwpdb.org/) 'Announcement: Release of Remediated PDB Data' (16 April, 2007). This is a welcome development for the structural biology community.

Even so, we are disappointed with the reply from the wwPDB group. What we had hoped to read was a plan for structural biology to regain its lead in scientific data standards. Instead what the letter consists of is a series of red herrings, excuses for past problems, a complacent description of the current situation and a vague promise of jam tomorrow. Our main claims that mmCIF is a poor ontology and that the RCSB is a poor relational database are not seriously disputed.

Considering the red herrings: the wwPDB authors object to our use of “PDB” and of the term “Brookhaven Protein Data Bank” for post-1998 data. Yet, the title 'Overhauling the PDB' was Nature Biotechnology's editorial suggestion, not ours, indicating that PDB is a generally accepted term for their organization. And although we should have deleted the word “Brookhaven” (which was erroneously introduced by editors at the proof stage), one must ask, 'What's in a name?' Would your data smell any sweeter with the correct name?

The wwPDB group also claims that we argue “that data organization in our data dictionary or any domain dictionary for that matter, should dictate the logical and physical organization of our database systems.” We don't claim this. We are well aware of the differences between a logical and physical database model, which is why we were surprised that the RCSB PDB logical and physical model are exactly the same! The question of how an ontology can contribute to database design is an active area of research with high potential. The worry is that given the poor example of the RCSB PDB, database developers may draw the wrong conclusion about the usefulness of ontologies.

Considering the excuses: the wwPDB group argues that it has a lot of complex data to deal with. We, of course, accept this. But the problem is not new and PDB/wwPDB have had over 35 years to get things right. We have examined the wwPDB remediated chemical component dictionary and note that the obsolete component codes are now clearly labeled as such. Even so, we believe that some, perhaps many, of the mmCIF files on the remediated wwPDB FTP (file transfer protocol) site still contain incorrect data. For example, the nuclear magnetic resonance–obtained protein structures 1AXJ and 2FN2 are both supposedly remediated, yet both contain the mmCIF CELL category (which is defined as 'Data items in the CELL category record details about the crystallographic cell parameters').

The wwPDB also seem to put the blame for poor features in mmCIF on the International Union of Crystallography (IUCr). This seems to be an abdication of responsibility. Their claim that mmCIF was an ontology when Westbrook and Bourne1 was written, but the meaning of the term 'ontology' has changed, is also weak.

To conclude, we hope that before the end of the decade, the wwPDB will present the structural biology community with guaranteed clean and self-consistent structural data, a state of the art ontology to represent these data and link it with other types of data and (at least) one state-of-the-art relational database to store and access the data.