The aim is to try and find means to either change the abundance of a given gene transcript to treat a disease or to cause differentiation. I have more than one friend who either had or has cancer and our methods for treating cancer are blunderbuss, really idiotic, even though they are much more sophisticated than they used to be. We're just killing dividing cells. What if we could get to where we could direct cells to differentiate? It's huge in its practical importance if we could make that happen.
SA: Is bioinformatics the tool to integrate the computational work and the wet work?
Bioinformatics has to be expanded to include experimental design. What we're going to get out of each of these pieces of bioinformatics is hypotheses that now need to be tested. And it helps you pick out what hypothesis to go test. And the reason is we don't know all 100,000 genes and the entire circuitry. Even if we knew the entire circuitry, as we do for ganglia in the lobster gut -- people having been working for 30 years to understand how the lobster gut ganglia work, even knowing all the anatomical connections. So it isn't going to be easy.
I think the greatest intellectual growth area will come with the inverse problem. The point is the following: I show you the Affymetrix chips of differing patterns of gene expression and you tell me from that data what gene actually regulates what gene by what logic. That's the inverse problem. I show you the behavior of the system and you deduce the logic and the connections among the genes.
SA: Do you see being able to do in silico work for the entire human body or various circuits anytime soon, or ever?
Yes, I do. I think our timescale is 10 to 15 years to develop good models of the circuitry in cells because so much of the circuitry is unknown. But before I can make the model and explore the dynamical behavior of the model, I can either use the ensemble approach, which I've used, or I actually have to know what the circuitry is. There are three approaches for discovering the circuitry. One is purely experimental, which is what molecular biologists have been doing for a long time.
SA: How accurate is it? How testable is it?
It works for small networks, for synchronous lightbulb networks. Real cells aren't lightbulbs; they're graded levels of activity and they're not synchronous, so it's a much harder problem to try and do this for real cells for a variety of reasons. First of all, when you take a tissue sample, you don't have a single cell type, you typically have several cell types in it. A lot of the data that's around has to do with tissue samples because it's from biopsy data. Second, most of it is single-moment snapshots rather than a sequence of gene activities along a developmental pathway. That's harder data to get.
It's beginning to become available. It is from the state transition that we can learn an awful lot of the data about which genes regulate which genes. There are other potentially powerful techniques that amount to looking at correlated fluctuations in patterns of gene activity and trying to work out from those which genes regulate which genes by what rules. The bottom line of such inverse problem efforts is that the algorithms are going to come up with alternative possible circuits that would explain the data. And that then is going to guide you to ask what's the right next experiment to do to reduce your uncertainty about what the right circuit is. So the inverse problem is going to play into the development of experimental design.
SA: You start with microarray experimentation?
You go in saying, "We think gene A regulates gene F." And now you say, "If that's true, if I perturb the activity of gene A, I should see a change in the activity of gene F; specifically if I turn gene A on, I should turn gene F on." So now you go back and find a cis site and a trans factor, or a small molecule when added to the cell will turn gene A on, and then you use an Affymetrix chip or some analogue of that to say, "Did I just turn on gene F?"