Sat, Sep 26, 2009 - Page 9 News List

The language of genomics

Access to genome sequences is important, but the genomes mean little without corresponding medical records,just as a novel means little without a corresponding knowledge of language

By Esther Dyson

Last week, a company called Complete Genomics announced 10 new customers for its genome-sequencing service. The price was not specified, but the company said its goal is to offer the service for US$5,000 within a year.

What struck me was not the announcement itself, but the name of the chief executive officer — Cliff Reid, the chief executive officer when I knew him in the 1980s of a text-search company called Verity. The connection hit me almost immediately. Genes are, in a sense, the instruction language for building humans (or any other living thing) and language is symbols that interact to build meaning, and, yes, of course, it was the same Cliff Reid I knew back in the late 1980s.

What Complete Genomics is doing with the US$91 million it has raised so far is exciting. It has built a genome-sequencing factory and plans to build several more over the next few years. Many academic and commercial research facilities want one, as do several countries.

What I find interesting are the implications. Right now, a genome is akin to a novel written in an unknown language. There is a huge amount of information in there, but we can’t understand it. Imagine getting a copy of Tolstoy’s War and Peace in Russian and (assuming you can’t read Russian) trying to figure out the story. Impossible. That’s pretty much the situation of natural-language understanding at the time Reid joined Verity.

On the other hand, we have started recognizing some words — specific genetic variants — that seem to correspond to certain incidents in history. In the case of genetics, those incidents are diseases and conditions, And just as it usually takes several individuals to cause an incident, so it often takes several genetic variations, plus ambient factors, to cause a disease. Genes often work together, sometimes aided by factors such as a person’s diet or behavior, to cause a condition.

There are two key challenges in genomics. One is simply detecting the genes, alone or in combination, that seem to lead to certain diseases. That alone can be useful. With enough data, we can then figure out that the same “disease” is in fact a variety of different disorders, some susceptible to particular known treatments and some susceptible to others, or simply incurable.

For this, mere correlation is sufficient. People with BRCA-derived breast cancer benefit from treatment with herceptin, whereas those with other kinds of breast cancer do not. We don’t know why, but the correlation is clear.

The second challenge is to understand how the genes interact among themselves or with other factors to produce the condition, which should enable the development of new preventive measures or treatments, based on the details of how the condition begins and how it progresses. That, of course, is much more interesting — and harder to do. In a sense, it’s the difference between matching words and understanding a piece of text.

So, it is no surprise that Reid has found a role in this new marketplace. Complete Genomics and its competitors are about to create huge amounts of data. Their edge is not just sequencing the genomes cheaply, but also refining the data into lists of variations. In other words, for most research the questions revolve not around an entire genome, but around the relevant differences of any individual’s genome from the norm.

This story has been viewed 1175 times.
TOP top