SA: What is the practical application?
We've just expanded the set of drugable targets from the enzyme itself to the circuitry that controls the activity of the gene that makes the enzyme.
SA: What is bioinformatics' role in this entire enterprise?
Let's take the inverse problem. The amount of data that one is going to need from real fluctuating patterns of gene expression and the attempt to deduce from that which gene regulates which gene by what logic -- that's going to require enormous computational power. I thought the problem was going to be what's called NP-hard, namely exponentially hard, in the number of genes. I now think it's not; I think it's polynomially hard, which means it's solvable or it's much more likely to be solvable in the number of genes.
The real reason is the following: Suppose that any given gene has a maximum of somewhere between one and 10 inputs. Those inputs, if you think of them as just on or off, can be in 210 states; they can all be on or can all be off or any other combination. Well, 210 is 1,000. That's a pretty big number. But it's small compared to 1030,000. Since most genes are regulated by a modest number of other factors, the problem is exponential in the number of inputs per gene, but only polynomial in the number of genes. So we have a real chance at cracking the inverse problem. I think it's going to be one of the most important things that will happen in the next 15 years.
SA: Is the inverse problem a true problem or one method to get information?
It's a true problem. The direct problem is that I write down the equations for some dynamical system and it then goes and behaves in its dynamical way. The inverse problem is you look at the dynamical behavior and you try and figure out the laws. The inverse problem for us is we see the dynamical behavior of the genome displaying itself on Affymetrix chips or proteomic display methods, like 2-D gels. And now we want to deduce from that what the circuitry and the logic is. So it's the general form of a problem. It's the way to try and get out which genes are regulating which genes, so that I know not just from my ensemble approach, but I know that gene A really is regulating gene F.
SA: What are the barriers to figuring out the inverse problem? Is it computer power? Designing the proper algorithms? Biomolecular understanding?
All three. Let's take a case in point: Feedback loops make it hard to figure things out. And the genome is almost certainly full of feedback loops. For example, there are plenty of genes that regulate their own activity. So figuring out the algorithms that will deal with feedback loops is not going to be trivial. The computing power gets to be large because if I want to look for a single input, like a canalizing input, the canalizing input is really easy to tell because if gene A is on, then gene C is on no matter what. So all I have to do is examine a lot of gene expression data and I can see whenever A is on, now C is on a moment later.
I can do that by looking at things one gene at a time. But suppose I had a more complicated rule in which two genes had to be in a combined state activity to turn on gene C. To do that, I have to look at genes pair-wise to see that they manage to regulate gene C. If I looked at A alone or B alone, I wouldn't learn anything. So now if I have 100,000 genes, I've got to look at 100,0002 pairs; that's 1010 pairs. Now what if I have a rule that depends on the state of activity of three genes to turn the gene on, then I have to look at 1053, which is 1015, and that's probably about the limit of the computing power that we've got now.
But that leaves out the fact that we don't have to be stubborn about it; we can always go do an experiment. And so this now ties to experimental design. Notice that all of these problems lead in the direction of new experimental designs, and what we're going to have to do is to marry things like the inverse problem to being able to toggle the activity of arbitrary genes.