Sir

In February 2003 the US National Institutes of Health (NIH) released a statement on its data-sharing policy, in which it strengthened its commitment to free exchange of final research data produced by public funds (see Nature 421, 877–878; 200310.1038/421877a). Little attention has been given, however, to a related question — sharing and protecting the products (software and tools) of bioinformatics research, especially infrastructure generated to support large projects.

As projects and databases evolve, some will inevitably lose funding or be shut down. What happens to the bioinformatics software? In one such case, the Genome Database, an international collaboration set up in support of the Human Genome Project, lost US federal funding in 1998, although it was later reopened with private funding. During the shut-down process, a significant portion of the source code (though not data) was irreparably lost. This database represented an investment of more than US$50 million by US, European and Asian sources. The loss of the code, a public asset, occurred because there was little supervision during decommissioning.

As a minimum safeguard, we propose the creation of a Bioinformatics Software Archive, in which an archival copy of bioinformatics software would be maintained in a secure central repository supported by public funding. There are moves in the bioinformatics community to adopt open-source standards for software, which provide a means of online archiving, but these are not suitable for every type of software. As with software released under open-source agreements, a central archive will help stop too many researchers trying to reinvent the wheel (thereby saving research funds).

If funding agencies such as the NIH are serious about data sharing, they should be serious about protecting the software used to produce those data.