In 1971, protein crystallographers attending the Cold Spring Harbor Symposium on 'Structure and Function of Proteins at the Three Dimensional Level' began to discuss the idea of a central, open repository for protein structural data. Later that year, a short statement in Nature New Biology officially announced the establishment of the Protein Data Bank (PDB), a repository for protein crystallographic data initially run as a collaboration between the Brookhaven National Laboratory and the Cambridge Crystallographic Data Centre.

Credit: © NO DEROG/ ISTOCK/THINKSTOCK

At the time, sharing crystallographic data was a fundamentally challenging endeavour. Receiving and distributing structural data required shipping paper punch cards or magnetic tape through the mail, and computer hardware and software needed to visualize or analyse these data were still rare. In 1974, three years after launch, the PDB had less than twenty structures available for distribution in its repository.

Working at Brookhaven, with the collection of protein structures that would grow into the PDB, Edgar Meyer developed the first general software tools for handling and visualizing protein structural data. In 1971, he published a description of the first software for interactive three-dimensional visualization of protein structures, and then, in 1974, a software for storing and searching protein structures in the PDB. The latter included a brief description of a system that permitted remote computers to connect and search data stored at Brookhaven — an early forerunner to the web-based systems we now take for granted.

These early tools, and the PDB itself, were part of a long-standing tradition within the crystallography community of openly sharing code and software. Building on this tradition in 1979, scientists in the UK, including David Blow, Tom Blundell and Eleanor Dobson, founded the Collaborative Computational Project Number 4 (CCP4) to provide protein crystallographers with software tools for processing and analysing crystallographic diffraction data. CCP4 evolved into a suite of programs that are still used to this day.

In the 1980s, as techniques for structure determination improved and supporting computer technologies became more widely available, the number of structures deposited at the PDB began to grow dramatically. By the end of the decade, the value of the PDB had become sufficiently evident that structural biologists, led by Fred Richards, began to argue that deposition of structural data to the PDB should be required of all scientists in the field.

As a testament to the success of these early efforts, the PDB now hosts more than 100,000 structures, of which more than 87,000 are derived from X-ray crystallography. Today, the PDB is mirrored and distributed from centres on three continents: the Research Collaboratory of Structural Bioinformatics (RCSB PDB) in the US, EMBL-EBI's Protein Data Bank in Europe (PDBe) and PDB Japan (PDBj). All structures are provided freely and without restriction, and many journals routinely require deposition of protein structures and the associated experimental data to the PDB as a prerequisite for manuscript publication.