This page has been archived and is no longer updated

 
November 30, 2010 | By:  D. Jack Li
Aa Aa Aa

Genomic Sequencing and Information Bottleneck

As of now, we have 1,349 microbial and 40 eukaryotic genomes completely sequenced, with 3544 prokaryotic and 543 eukaryotic sequencing projects still in progress.1 Among the organisms whose genetic blueprints have been decoded are the cholera bacterium, the diatom, the maize plant, the cat, the cow, and of course, us — Homo sapiens — and our extinct evolutionary cousins, the Neanderthals.

We have certainly come a long way since 1974, when Gilbert and Sanger independently developed nucleotide sequencing strategies, and since the sequencing of the first complete genome — that of Bacteriophage FX-174 — in 1977. DNA sequencing has arguably been one of the fastest growing technologies in scientific research since its inception last century. In the past decade, high-throughput methods using parallel sequencing have enabled more massive genetic input and greater decoding speed, thus significantly lowering the time and cost barriers to obtaining nature's secrets of life.

In fact, DNA sequencing, previously a scientist's lab tool, is fast becoming a popular commodity offered by personal genomics companies like 23andMe, Knome, Navigenics, and Illumina (the latter charging a meager $19,500/genome and a discounted $14,500/genome for "groups of five individuals ordered through the same physician"). In the midst of such sequencing hype, how are we benefitting from the overwhelming influx of information? The rate at which the genomic data is being churned out far outpaces the rate at which the scientific community is able to digest it. Are we to make sense of all the As, Ts, Gs, and Cs?

A paper by McClellan and King published earlier this year in the journal Cell highlights the challenges scientists face when mining for meaning in the sea of genetic data.2 Beyond revolutionizing basic research methodology, DNA sequencing has led to a more detailed understanding of the genetic basis of human disease. Part of the insight comes from genome-wide association studies (GWAS), which compare human genomes at the population level to find genetic variations — single-nucleotide polymorphisms (SNPs) — associated with different traits and diseases.

One problem with GWAS that the authors highlight is that common diseases (e.g., breast cancer, hearing loss, lipid metabolism disorders) often emerge from individually rare mutations that make big contributions to disease pathobiology. Evolutionarily, this makes sense, as natural selection allows common disease-associated polymorphisms (which make up 90% of human variation) to persist if they have no functional significance or if they confer a subsidiary advantage to the organism (like the hemoglobin gene defect in sickle cell anemia, which gives a survival advantage in regions with a high prevalence of malaria). Therefore, the genetic variations that directly and deleteriously impact our health would most likely be recent mutations that arose on an individual basis and which natural selection has not had time to eliminate. This takes into account the genetic heterogeneity of human disease, meaning that the same gene may present different rare severe mutations in different individuals and that different genetic defects in the same or related pathways may lead to the same disease.

Given such insights, it may not be enough to rely solely on GWAS findings to understand the genetic basis of disease, as many of the SNPs have been found to have no impact on their associated diseases. We need better ways of making sense of our genomes while taking into account the genetic heterogeneity of disease. So while personal genetic sequencing may provide a lot of data, we may not be ready to say what the data means just yet.

The question posed by the personal gene sequencing services is then whether we are at all ready to have the data. Can we, the physicians, and the insurance companies reliably predict our health outcomes by just looking at our nucleotide sequence? This may be so in the future, but right now it seems that there is a greater need to understand the sequences that we already have rather than generating more sequences than we can currently handle.

Image Credit: http://wikimedia.org

1. NCBI Database.

2. McClellan, J., & King, M. C. Genetic Heterogeneity in Human Disease. Cell 141, 210–217 (2010).

0 Comment
Blogger Profiles
Recent Posts

« Prev Next »

Connect
Connect Send a message

Scitable by Nature Education Nature Education Home Learn More About Faculty Page Students Page Feedback



Blogs