The ability to comfortably sit down at the computer, open the 'microbe.exe' file, edit sequences, knock out genes, install pathways, identify drug targets and generate novel phenotypes, is the ultimate dream of a systems biologist.

Developments over the last decade indicate that such a scenario will most likely emerge in the not-so-distant future. Simulating living microbes from the genes to expression, interaction, pathways and networks, is slowly transforming biology into a computational problem. Currently, the key bottlenecks are the right quality, diversity and quantity of data.

The chosen microbe

Escherichia coli , the resident bacteria in human gut, have been historically the most favourite organism for biological research and obvious top choice for virtual cell development. The reasons are not too hard to imagine. The bacterium has been studied for many decades at the DNA, RNA, protein, metabolism, signalling and regulatory levels, leading to the generation of enormous data.

Though E.coli is a single cell and seemingly less-complex than tissues or multi-cellular organisms, its molecular inventory is not so simple. At last count, the bacterium seemed to comprise of 4489 genes, 3666 RNA and protein molecules, 1450 enzymes, 1446 metabolic reactions, 2105 compounds, 292 transport reactions, 175 transcription factors and 5345 regulatory interactions1. There seem to be at least 1878 promoters, 239 terminators, 1940 transcription factor binding sites, 2697 regulatory interactions and 77 small molecule effectors. Adding up all the types of molecules and the number of times each one gets repeated in the cell, we are probably looking at a 60 million-molecule inventory.

Scientists have studied all the 4489 genes of E.coli , in an effort to understand how the organism receives signals and food from environment, breaks down food into digestible products, survives harsh environmental conditions, reproduces, adapts and evolves. All the known molecular parts and wiring details are available in EcoCyc v14.5 database, the most comprehensive database on E.coli till date. The database also offers specialized search on 27 500 E. coli articles using keyword and ontology-based searches.

Using this morass of data, the whole cell scale model of E.coli has been developed2. The model has been used to predict the rate at which: (a) metabolites get distributed in the cell, (b) cell grows over time and (c) the cell intakes food and secrets products.

The lab knocked out 1366 genes in silico , one at a time in order to simulate cell physiology. The virtual cell model seems to be reasonably accurate in making phenotypic predictions over a wider range of conditions. In particular, growth phenotypes were accurately predicted on both glucose and glycerol minimal media.

E.coli virtual cell models have also been created to sense the surroundings and decide which combination of swimming and tumbling is optimum to move forward in a three dimensional space. More recently, scientists have also simulated the movement of wild type and mutant E.coli in a virtual environment based on known biochemical pathways3. The virtual bacteria showed both adaptive and non-adaptive dynamics in computer environment that mimics a natural environment.

Beyond E. coli

In addition to virtual coli, there are at least three more virtual microbial cell projects slowly gaining momentum.

The Virtual TB project — a collaboration between Surrey University, UK and Keio University, Japan — is aimed at describing metabolism, gene regulation and physiology of Mycobacterium tuberculosis growing in vitro and in its host cell, the human macrophage. The team has constructed a genome scale metabolic network model of the TB bacillus. However, a number of parameters pertaining to the in vitro and in vivo growth remain unknown.

The virtual Pseudomonas project4 is about making a computer model of the metabolic networks in Pseudomonas putida , an industrially relevant pseudomonas. The model has been used to study cell growth, metabolic network and robustness. One of the key outcomes of this project has been prediction of novel strategies for bioplastic production. The genome scale constraint based model of P. putida enabled determination of metabolic network structure and identification of knowledge gaps leading to improvement of gene annotations. Most of the data used in this project came from the Kyoto Encyclopedia of Genes and Genomes and the Pseudomonas Genome database.

Recently, another international group reconstructed the metabolic network of Chlamydomonas reinhardtii5. One of the key outcomes of this work is a better understanding of the light driven metabolism leading to cell growth variations.

It appears that the field of systems biology has already started a phase transition from genomics to virtual cell development. The ability to construct virtual cells with reasonable density and accuracy may encourage designing cells towards preferred outputs.

An unexpected benefit of virtual cell construction has been the realization that a combination of 'sequence similarity search with metabolic network analysis' can be used to determine the most likely metabolic gene content of organisms in which the data is very little. This has enormous applications in the field of metagenomics where only DNA sequence data is available most of the time.

This article is the third in a series entitled 'Virtual Cell'.