Image: SANTA FE INSTITUTE
Stuart Kauffman wears many hats. He is an entrepreneur who founded Bios Group, a Santa Fe-based software company where he is now chief scientific officer and chairman of the board, and co-founded Cistem Molecular in San Diego. He is an academic, with current posts as an external professor at the Santa Fe Institute and a professor emeritus at the University of Pennsylvania School of Medicine. And he is an author, having written numerous papers and three popular books (Origins of Order: Self-Organization and Selection in Evolution, At Home in the Universe and Investigations). But perhaps most of all, he is a visionary.
Indeed, Kauffman is among the pioneering scientists now mapping the intersection of computational mathematics and molecular biology--a burgeoning field known as bioinformatics. Ken Howard, a freelance writer based in New York City, recently sat down with Kauffman for Scientific American to discuss this relatively new discipline. An edited transcript of their conversation follows. Additional information on bioinformatics and the genome business in general will appear in the July issue of Scientific American.
SA: What is the promise of bioinformatics?
The completion of the human genome project itself is a marvelous milestone, and it's the starting gate for what will become the outstanding problem for the next 15 to 20 years in biology--the postgenomic era that will require lots of bioinformatics. It has to do with how do we understand the integrated behavior of 100,000 genes, turning one another on and off in cells and between cells, plus the cell signaling networks within and between cells. We are confronting for the first time the problem of integrating our knowledge.
SA: How so?
The fertilized egg has somewhere between 80,000 and 100,000 structural genes. I guess we'll know pretty quickly what the actual answer is. We're entitled to think of the, let's say, 100,000 genes in a cell as some kind of parallel processing chemical computer in which genes are continuously turning one another on and off in some vastly complex network of interaction. Cell signaling pathways are linked to genetic regulatory pathways in ways we're just beginning to unscramble. In order to understand this, molecular biologists are going to have to -- and they are beginning to -- change the way they think about cells. We have been thinking one gene, one protein for a long time, and we've been thinking very simple ideas about regulatory cascades called developmental pathways.
The idea is that when cells are undergoing differentiation, they are following some developmental pathway from a precursor cell to a final differentiated cell. And it's true that there's some sort of pathway being followed, but the relationship between that and the rolling change of gene activities is far from clear. That's the huge problem that's confronting us. So the most enormous bioinformatics project that will be in front of us is unscrambling this regulatory network. And it's not going to merely be bioinformatics; there has to be a marriage between new kinds of mathematical tools to solve this problem. Those tools will in general suggest plausible alternative circuits for bits and pieces of the regulatory network. And then we're going to have to marry that with new kinds of experiments to work out what the circuitry in cells actually is.
SA: Who is going to be working on this entire rubric? Is it bioinformaticians, or is it mathematicians or biologists?
All of the above. As biologists become aware of the fact that this is going to be essential, they are beginning to turn to computational and mathematical techniques to begin to look at it. And meanwhile we have in front of us the RNA chip data that's becoming available and proteomics as well. An RNA chip shows the relative abundance of transcripts from a very large number of different genes that have been placed on the chip from a given cell type or a given tissue sample. There are beginning to be very large databases of RNA chips that have expression data for tens of thousands of genes. For normal cells, diseased cells, untreated and treated normal and disease cells. Most of the data is a single moment snapshot. You just sample some tissue and see what it is doing. But we're beginning to get data for developmental pathways.