Parsing the "Junk" of the genome: Researchers have now found that much of the space on the genome between genes is extremely active in keeping our genes working as they should. Image: iStockphoto/Kalawin
When the draft of the human genome was published in 2000, researchers thought that they had obtained the secret decoder ring for the human body. Armed with the code of 3 billion basepairs of As, Ts, Cs and Gs and the 21,000 protein-coding genes, they hoped to be able to find the genetic scaffolds of life—both in sickness and in health.
But in the 12 years since then, very few diseases—almost all of them very rare—have been linked definitively to changes in the genes themselves. And large, genome-wide studies searching for genetic underpinnings for more common diseases, such as lung cancer or autism, have pointed to the nether regions of the genome between the protein-producing genes—areas that were often thought to contain “junk” DNA that was not part of the pantheon of known genes.
An international consortium of hundreds of scientists has now deciphered a large portion of the strange language of this junk DNA and found it to be not junk at all. Rather it contains important signals for regulating our genes, determining disease risk, height and many of the other complex aspects of human biology that make each one of us different. The findings are described in 30 linked papers published online September 5 in Nature and other journals and described at the consortium's Web site. (Scientific American is part of Nature Publishing Group.)
Called the Encyclopedia of DNA Elements (ENCODE), the group is focused on understanding not just the elements of the genome but also how they work together. "The complexity of our biology resides not in the number of our genes but in the regulatory switches," Eric Green, director of the National Human Genome Research Institute and collaborator on the ENCODE project, said in a press briefing September 5. Through more than 1,600 separate experiments, analysis of more than 140 cell types and a massive amount of data analysis, the group found about 4 million of these so-called switches and can now assign functions to more than 80 percent of the entire genome. Compare that to the roughly 2 percent of the genome that is responsible for the protein-coding genes that researchers have been relying on to look for diseases and traits. "The genome project was about establishing the set of letters that make up the blueprint," Green said. "When we finally put that blueprint together, we realized we could only really understand very little of it."
These newly catalogued switches not only activate and de-activate genes, but also control how much of each protein gets made and when. They are involved in epigenetic changes, such as DNA methylation, which has been implicated in cardiovascular disease and other conditions. The new data promise to improve our understanding of many common diseases that might have similar genetic underpinnings. Genome-wide association studies (GWAS) have continuously come up short in identifying specific genes for common diseases, John Stamatoyannopoulos, associate professor of genome sciences at the University of Washington School of Medicine and ENCODE collaborator, said in the briefing. "Frustratingly, about 95 percent of information from these studies has been pointing to regions of the genome that do not make proteins," he said. But, now with the ENCODE data, they can begin to decipher what genetic switches and functions might be common within and among these diseases. "We're now exploring previously hidden connections between diseases that may explain similar clinical [symptoms]," he noted.