Long stretches of DNA previously dismissed as “junk” are in fact crucial to the way our genome works, an international team of scientists said on Wednesday night.
It is the most significant shift in scientists’ understanding of the way our DNA operates since the sequencing of the human genome in 2000, when it was discovered that our bodies are built and controlled by far fewer genes than scientists had expected. Now the next generation of geneticists has updated that picture.
The results of the international Encode project will have a huge impact for geneticists trying to work out how genes operate. The findings will also provide new leads for scientists studying conditions such as heart disease, diabetes and Crohn’s disease that have their roots partly in glitches in the DNA. Until now, the focus had largely been on looking for errors within genes themselves, but the Encode research will help guide the hunt for problem areas that lie elsewhere in our DNA sequence.
Ewan Birney of the European Bioinformatics Institute near Cambridge, one of the principal investigators in the Encode project, said: “In 2000, we published the draft human genome and, in 2003, we published the finished human genome and we always knew that was going to be a starting point. We always knew that protein-coding genes were not the whole story.”
For years, the stretches of DNA between our 20,000 or so protein-coding genes — more than 98 percent of the genetic sequence in each of our cells — was written off as “junk” DNA. Already falling out of favor in recent years, this concept will now be consigned to the history books.
Encode is the largest single update to the data from the human genome since its final draft was published in 2003 and the first systematic attempt to work out what the DNA outside protein-coding genes does. The researchers found that it is far from useless: Within these regions, they have identified more than 10,000 new “genes” that code for components that control how the more familiar protein-coding genes work. Up to 18 percent of our DNA sequence is involved in regulating the less than 2 percent of the DNA that codes for proteins. In total, Encode scientists say, about 80 percent of the DNA sequence can be assigned some sort of biochemical function.
Scientists know that while most cells in our body contain our entire genetic code, not all of the protein-coding genes are active. A liver cell contains enzymes used to metabolize alcohol and other toxins, whereas hair cells make the protein keratin. Through some mechanism that regulates its genes, the hair cell knows it should make keratin rather than liver enzymes, and the liver cell knows it should make the liver enzymes and not the hair proteins.
“That control must have been somewhere in the genome, and we always knew that — for some individual genes — it was an element sometimes quite far away from the gene,” Birney said. “But we didn’t have a genome-wide view to this. So we set about working out how we could discover those elements.”
The results of the five-year Encode project are published today across 30 papers in the journals Nature, Science, Genome Biology and Genome Research. The researchers have mapped 4 million switches in what was once thought to be junk DNA, many of which will help them better understand a range of common human diseases, from diabetes to heart disease, that depend on the complex interaction of hundreds of genes and their associated regulatory elements.