The year 1998 was crucial for the Human Genome Project (HGP), an international collaboration launched eight years before to sequence the complete human genome. Spurred by the launch of a privately financed sequencing bid by Craig Venter, the HGP's leaders decided to accelerate their own efforts. Some of the proposed changes caused friction — the HGP was long planned and carefully executed. In October that year, John Sulston, then director of the Wellcome Trust Sanger Institute near Cambridge, UK, felt so beleaguered that he sent a strongly worded e-mail to Francis Collins, then director of the US National Human Genome Research Institute (NHGRI) in Bethesda, Maryland. The subject line? 'Friendly fire'.

For anyone interested in the history of the HGP, this e-mail is a key document (and one that was later acknowledged as an 'emotional outburst' by its author, who now leads the Institute for Science, Ethics and Innovation at the University of Manchester, UK). Was it a catalyst for improved communication between the main players? Who helped to resolve the conflict? To what extent were the directors of the five leading sequencing centres competing as well as collaborating? The content of the e-mail traffic between and within the sequencing teams offers a potentially rich seam of enquiry.

The correspondence of Francis Collins (left) and John Sulston illuminates a vital part of science history. Credit: M. ARGLES /GUAR DIAN NEWS & MEDIA

As the co-author of Sulston's account of the HGP (The Common Thread; Joseph Henry Press, 2002), I saw this e-mail and many others, but they are not generally available. That may change, thanks to an international archiving programme now under way. The Wellcome Library in London is funding an archivist, Jenny Shaw, to survey the documentary record relating to the HGP and earlier mapping and sequencing activities in the United Kingdom between 1977 and 2004. Ludmila Pollock, executive director of the library and archives at Cold Spring Harbor Laboratory (CSHL) in New York, is conducting a parallel exercise in the United States. The first objective is to catalogue these materials. A longer-term aim — which will depend heavily on funding and the willingness of the scientific community to cooperate — is to secure them in reputable repositories and make them available to scholars.

The programme throws into relief how fragile a trace modern science is leaving in the historical record. As a scientific biographer, I have spent hours happily immersed in piles of yellowing papers that are carefully stored in archive boxes and guarded by watchful custodians in academic libraries. Future biographers will not be as lucky. Today's scientists underestimate the historical importance of anything other than their published papers; they communicate almost entirely electronically; and funding for archival preservation is increasingly uncertain. If we care about documenting the astonishing discoveries of the twentieth and twenty-first centuries, we must act now.

Good practice

Why are archiving exercises such as the HGP's necessary? We are fortunate that Collins, who is now director of the US National Institutes of Health (NIH), kept his papers, and even more fortunate that his successor at the NHGRI, Eric Green, is employing an archivist to digitize them. The NHGRI is only now developing an archiving policy. Previously, says Green, 'records administration' at the institute meant throwing things away that were 'no longer needed'. But this material represents only a fraction of the documents that record the HGP's history. The genome-sequencing story began before the NIH took a leading role, and it involved many institutions and individuals — inside and outside the United States — for which the NIH had no responsibility.

There remains an urgent need to reconstruct the 'paper trails' — often largely electronic — that eventually led to the complete, publicly available sequence. These include a vast amount of e-mail correspondence, as well as informal literature such as The Worm Breeder's Gazette (now exclusively online) and the software that Sulston and his colleague Richard Durbin wrote to manage their genome-mapping data in the mid-1980s.

Comprehensive record-keeping is easier within a single institution. CERN, the particle-physics laboratory near Geneva, Switzerland, showed a commendable sense of its own historical significance when it commissioned a regularly updated biography in 1979, 25 years after it opened. Divisional records officers at the facility now ensure a smooth pipeline through to the central archives. The archivists encourage senior scientists to have their filing systems appraised for historical interest, and they have a strategy for selecting and preserving e-mails. As the birthplace of the World Wide Web, CERN is also working to archive its own web pages.

Particle-accelerator building at CERN, which launched its archiving programme in 1979. Credit: CERN/SPL

Scientific archives typically consist of institutional records such as CERN's, and the personal papers of distinguished (and, usually, dead) scientists. When I wrote a biography of the Nobel prizewinner Dorothy Hodgkin (1910–94), I relied heavily on her collected papers, housed in the Bodleian Library at the University of Oxford, UK. Among them I found, for example, a letter dated June 1939, in which Hodgkin's contemporary Dorothy Wrinch attempted — unsuccessfully — to win sisterly solidarity for her erroneous theories on protein structure (“Our chromosome count of course does not tend to weaken the desire of others to attack us,” she wrote).

Fleeting tweets

But it is increasingly difficult for such repositories to capture the full picture. Modern science involves experts in numerous disciplines, collaborating within and between institutions. Few will have reached the age or eminence at which scientists might typically think of depositing their papers. The lab notebooks kept by technicians and junior research staff are also important. Most of the contributors to the HGP are still very much alive, so archivists will need permission to scan the contents of their e-mail accounts on personal hard drives or institutional servers.

The ubiquity of 'born-digital' material, such as e-mails and web pages — and now tweets, Facebook comments, forum posts and instant messages (IMs), which many scientists use to share expertise, announce new publications or engage in policy debates — makes the quest to preserve even more urgent. The hardware and software used to generate and store digital material can become obsolete or be deleted or otherwise destroyed with ease, and it is not always clear who owns it. The British Library in London is tackling these issues with its Digital Lives project; it also leads the UK Web Archive collaboration, which stores pages of scholarly interest. Several online tools for archiving tweets by hashtag or username are available.

It is difficult to convince people that their tweets and IMs are the stuff of history.

Technology is the easy bit, however. Much more difficult is convincing people that their off-the-cuff tweets and IMs are the stuff of history. E-mail users do not always file their personal and professional messages separately, and many are understandably wary of making any of their correspondence public. Since the 'Climategate' affair in 2009 — when e-mail servers at the University of East Anglia, UK, were hacked and their contents publicized — scientists are all too aware that an unguarded remark to a colleague can snowball into a cause célèbre.

But archivists have long experience in handling questions about confidentiality and access. The University of Cambridge library holds six boxes of love letters that belonged to the UK crystallographer J. D. Bernal, which are not to be opened until 2021 — 50 years after his death. I would trust any of the archivists I have worked with to respect my wishes.

“Let's not wait until memories have faded and papers been discarded ... before deciding to save our heritage,” exhorted Nobel prizewinner Sydney Brenner in 2007, in a letter announcing the donation of his papers to the CSHL archive (S. Brenner and R. J. Roberts Nature 446, 725; 2007). Others may ask why historians have any right see material not in the published record. Sulston himself, who is supportive of the archiving project, admits that “scientists can be quite conflicted about historical research, because it's important to forget things and move on.”

The history of science is much more than a chronology of scientific facts and theories. Access to informal sources is essential for understanding the personal, political and social context in which research takes place. Take the story of the discovery of the double-helical structure of DNA, told by historian Robert Olby in The Path to the Double Helix (Dover Publications,1974), by science writer Horace Freeland Judson in The Eighth Day of Creation (Simon and Schuster, 1979) and revisited more recently in the biographies of Rosalind Franklin and Francis Crick.

All these authors had access to correspondence and notebooks, without which we would have been none the wiser about the complex web of personal communication (and miscommunication) that led Crick and James Watson to their discovery. And when more turned up in 2010, the plot thickened again. Two decades of Crick's correspondence turned out to be mixed up with Brenner's papers (the two shared an office from 1956 to 1977), including a 1953 letter in which Crick admitted to a colleague that if he had seen Rosalind Franklin's photo of the 'A' form of DNA, he would have been “considerably worried” (A. Gann and J. Witkowski Nature 467, 519–524; 2010).

Save the savers

We want our scholarly successors to be able to follow the twists and turns of the scientific, political and personal pathways.

Fifty years hence, do we want the story of the sequencing of the human genome — an enterprise that forms the bedrock of much of twenty-first-century biomedicine — to rest on scientific papers and institutional reports, leavened only with breathless news reporting from the popular media? Surely not. We want our scholarly successors to be able to follow the twists and turns of the scientific, political and personal pathways that intersected as the human genome's 3 billion base pairs winked across the screens of the sequencing centres.

So, what to do? First, institutions and individuals need to have the confidence to place their records in the hands of professional archivists working in reputable repositories. Second, funding bodies need to be convinced that efforts to acquire, store, catalogue and disseminate such records are essential to preserving our scientific heritage.

In the United Kingdom, the funding situation is grim. The UK National Cataloguing Unit for the Archives of Contemporary Scientists had a 36-year track record of seeking out and cataloguing the papers of British scientists before losing its core funding and closing in 2009. A successor organization, the Centre for Scientific Archives (CSA), was established at a Science Museum store in Wroughton, UK, the same year. The CSA is run by a voluntary board chaired by Anne Barrett, an archivist at Imperial College London, and it receives some in-kind help from bodies such as the Royal Society and the National Archives.

The CSA has no core funding. At present, it engages freelance archivists to catalogue half a dozen collections of personal papers (including those of physicists Joseph Rotblat and Gareth Roberts) that came with their own funding. Meanwhile, the post of curator of the history of science at the British Library (which holds the papers of luminaries such as Charles Babbage and Alexander Fleming) has been frozen since it was vacated in February 2011.

Admittedly, history is not a priority at a time of recession. But it is baffling that scientific heritage has attracted so little support in the United Kingdom, especially as the country managed to raise almost £8 million (US$12.9 million) to keep an 1868 painting (Portrait of Mademoiselle Claus) by the French artist Edouard Manet from being moved overseas in August 2012. The project to catalogue the HGP is benefiting from the Wellcome Trust's long-standing commitment to the history of medical research, and in the United States, the CSHL has found early support from private donors and foundations.

But much more funding is needed. And it is urgent that we carry out similar scoping studies for other areas of science, to identify what should be saved before it disappears.

The sequencing of the human genome did not answer every question in biology, but it provided a publicly accessible resource that can be used for all time by inventive scientists with new questions. A documentary archive of the project will provide just such a resource for historians. If we want future generations to understand how science and society interact, libraries, research institutions and individual researchers must work together to preserve the documentary heritage of contemporary science.