In other words, there is more complex regulation of genes, and more rapid evolution of these regulatory elements, in humans?
Absolutely.
That’s a rather different way of thinking about genes—and evolution.
I get this strong feeling that previously I was ignorant of my own ignorance, and now I understand my ignorance. It’s slightly depressing as you realize how ignorant you are. But this is progress. The first step in understanding these things is having a list of things that one has to understand, and that’s what we’ve got here.
Earlier studies suggested that only, say, 3 to 15 percent of the genome had functional significance—that is, actually did something, whether coding for proteins, regulating how the genes worked or doing something else. Am I right that the ENCODE data imply, instead, that as much as 80 percent of the genome may be functional?
One can use the ENCODE data and come up with a number between 9 and 80 percent, which is obviously a very big range. What’s going on there? Just to step back, the DNA inside of our cells is wrapped around various proteins, most of them histones, which generally work to keep everything kind of safe and happy. But there are other types of proteins called transcription factors, and they have specific interactions with DNA. A transcription factor will bind only at 1,000 places, or maybe the biggest bind is at 50,000 specific places across the genome. And so, when we talk about this 9 percent, we’re really talking about these very specific transcription-factor-to-DNA contacts.
On the other hand, the copying of DNA into RNA seems to happen all the time—about 80 percent of the genome is actually transcribed. And there is still a raging debate about whether this large amount of transcription is a background process that’s not terribly important or whether the RNA that is being made actually does something that we don’t yet know about.
Personally, I think everything that is being transcribed is worth further exploration, and that’s one of the tasks that we will have to tackle in the future.
There is a widespread perception that the attempts to identify common genetic variants related to human disease through so-called genome-wide association studies, or GWAS, have not revealed that much. Indeed, the ENCODE results now show that about 75 percent of the DNA regions that the GWAS have previously linked to disease lie nowhere near protein-coding genes. In terms of disease, have we been wrong to focus on mutations in protein-coding DNA?
Genome-wide association studies are very interesting, but they are not some magic bullet for medicine. The GWAS situation had everyone sort of scratching their heads. But when we put these genetic associations alongside the ENCODE data, we saw that although the loci are not close to a protein-coding gene, they really are close to one of these new elements that we’re discovering. That’s been a lovely thing. In fact, when I first saw it, it was a slightly too-good-to-be-true moment. And we spent a long time double-checking everything.
How does that discovery help us understand disease?
It’s like opening a door. Think about all the different ways you can study a particular disease, such as Crohn’s: Should we look at immune system cells in the gut? Or should we look at the neurons that fire to the gut? Or should we be looking at the stomach and how it does something else?
All those are options. Now suddenly ENCODE is letting you examine those options and say, “Well, I really think you should start by looking at this part of the immune system—the helper T cells— first.” And we can do that for a very, very big set of diseases. That’s really exciting.



See what we're tweeting about




8 Comments
Add CommentI always had a feeling that “Junk DNA” is not junk at all. If you look at computer “.exe” file with code browser you will find some subroutines and a lot of data between them. Subroutines can be easily decoded and understood but data looks like a junk unless you know its origin – a picture, a sound, etc.
Reply | Report Abuse | Link to thisI forwarded this article to a colleague who works on the basic structures of organic entities and he replied (summary of my translation): "Isn't it strange that until very recently it was believed that only 20% of the genes were useful, these being mostly the genes that produced proteins, which would be roughly the equivalent of bosons producing mass (Higgs) and that the rest of the genes was considered junk, whereas in physics they know what 80% of the stuff is for except that they can't figure out what exactly produces mass? I was proposing to think along those lines three years ago and nobody wanted to even consider the idea (or did a few and never told me?). And look at this now."
Reply | Report Abuse | Link to thisI should add his POV is that if physics is nature's hardware then biology is its software and that we should see each as the two sides of the same coin. Maybe he is right when he says he has a ten-year headstart on elementary research and that he suffers not so much from social comparision bias but from reviewers' tunnelvision?
Amusing also that dark matter is said to be about 80% of all matter.
Reply | Report Abuse | Link to thisPresumably this is why the erstwhile junk is "now often referred to as the dark matter." (http://www.nytimes.com/2012/09/06/science/far-from-junk-dna-dark-matter-proves-crucial-to-health.html?pagewanted=all)
yes, congrats... you've managed to discover what many other researchers could have told you ten years ago.
Reply | Report Abuse | Link to thisI have suspected for a long time that this so-called 'Junk DNA' (present in ALL living organisms) is actually 'Gaia DNA'. There definitely is some co-ordinating mechanism that regulates the biosphere of this planet. What more obvious place could it be?!?!
Reply | Report Abuse | Link to thisWe are FAR MORE than just the sum of our parts!
http://blogs.nature.com/news/2012/09/fighting-about-encode-and-junk.html
Reply | Report Abuse | Link to thisBirney concurs with deprecating the term "junk DNA", but is conservative on how much non-protein-coding DNA may be functional ("between 9 and 80 per cent"). The fact that the entire genome is copied at every cell division suggests that close to 100% of DNA must be functional. Had any significant portion of DNA been non-functional in the past, evolutionary pressure to evolve an editing-out mechanism, and thus increase the cell’s energy efficiency, would have been tremendous: cells that could do this would dominate the biosphere. It is a reasonable assumption that, if such editing had ever been needed, then it would have arisen and would continue to operate, leaving only functional DNA.
Reply | Report Abuse | Link to thisBirney also uses the conservative term "regulation" to describe how the 98.8% of non-protein-coding DNA interacts with the 1.2% of protein-coding segments. A more useful terminology is to describe the entire genome as software – instructions for cells to build copies of themselves and, in multicellular lifeforms, assemble cells into lifeforms. In this view, the protein-coding segments are thought of as fixed-value strings within the code – a more useful initial hypothesis.
If we could send a personal computer back in time to Alan Turing, loaded with an application like Microsoft Excel and with a copy of the sourcecode, it seems unlikely that, on comparing the screen output with the sourcecode, Turing would conclude that fixed values in the code like "File", "Edit", and "View" were the essence of the software and that the other 99% merely "regulates" the operation of the fixed values when they are transcribed to the screen.
Instead of "regulatory elements" why not use the name Metagenes since they regulate genes? The turing machine analogy works well. Protein encoding genes are the constants in the system and the Metagenes control which constants get read via the transcription proteins and the exon / intron RNA selection processes and all the incredibly complex feedback mechanisms which regulate the supply and demand processes within the cell machinery not to mention the messaging which goes no between neighbouring cells and environmental response mechanisms!!!!!!
Reply | Report Abuse | Link to thisMy guess is that we will discover a complex hierarchy or Metagenes similar to the hierarchy of the hormonal control and regulatory system.