The widespread astonishment following the unconfirmed allegation by the UK government that researchers spent five years accidentally testing cattle instead of sheep brains for BSE is encouraging. It shows that politicians and the public alike appreciate a basic tenet of experimental design: be sure of the identity of what you are studying.

It may be shocking to some, therefore, that biologists knowingly transgress this rule daily. Genome databases are polluted with incorrect gene functions that are mistakenly assigned through researchers' uncritical faith in the results of BLAST algorithms. Powerful 'black-box' software packages mean researchers may plug data in, and get results out, with little thought for the rationale and caveats of the in-between. Many reported associations between diseases and DNA variants specific to particular regions of the genome have also recently emerged as being spurious or irreproducible (see Nature Rev. Genet. 2, 91–99; 2001). Here the explanation often seems to lie in researchers' unfamiliarity with the rigour needed in the statistical and experimental design of such population experiments.

The epitome of the required rigour is perhaps exemplified by the field of epidemiology, steeped in statistics and experimental design. Classical epidemiology has brought enormous strides in health research. Ironically, epidemiology itself suffers from a major flaw: the end points that it correlates — largely, crude clinical symptoms — are at best surrogates of the underlying biological basis of the disease.

Most of what we call 'diseases' are a kaleidoscope of conditions, with distinct origins, prognoses, risk factors, genetic susceptibilities, and responses to therapy. Until now, epidemiology has of necessity investigated a disease as if it were 'one' disease, whereas many variants of it may respond differently to the factors under study — a major confounding variable. Moreover, even in the most intensively studied diseases, identified risk factors account for only a fraction of the variation in morbidity and mortality. Much remains to be discovered.

Research would be substantially more effective if it could better identify patients with subtypes of a disease, and acquire a better understanding of the underlying biological correlates. That is the lofty goal of a new European project to marry high-throughput post-genomic technologies and epidemiology in a systems biology approach dubbed 'genomic epidemiology' (see page 139).

Funding for the project is uncertain, and technological obstacles abound, but it deserves support. The scientists behind it are showing vision by thinking outside their disciplinary and institutional boxes. In marrying epidemiology and high-tech post-genomics, they may not only rejuvenate epidemiology, but also set a new standard for experimental design in biology.