Editor's Note: This article was originally published in the February 1995 issue of Scientific American. We are reposting it this week because Robert Tijan has just been named president of the Howard Hughes Medical Institute.
Asthma, cancer, heart disease, immune disorders and viral infections are seemingly disparate conditions. Yet they turn out to share a surprising feature. All arise to a great extent from overproduction or underproduction of one or more proteins, the molecules that carry out most reactions in the body. This realization has recently lent new urgency to research aimed at understanding, and ultimately manipulating, the fascinating biochemical machinery that regulates an essential step in protein synthesis: the transcription of genes. For a protein to be generated, the gene that specifies its composition must be transcribed, or copied, from DNA into strands of messenger RNA, which later serve as the templates from which the protein is manufactured.
Even before therapy became a goal, transcription had long captivated scientists for another reason: knowledge of how this process is regulated promises to clarify some central mysteries of life. Each cell in the body contains the same genome, the complement of some 150,000 genes that form the blueprint for a human being. How is it that the original cell of an organism— the fertilized egg—gives rise to a myriad of cell types, each using somewhat different subsets of those genes to produce different mixtures of proteins? And how do the cells of a fully formed body maintain themselves, increasing and decreasing the amounts of proteins they manufacture in response to their own needs and those of the larger organism?
To answer these questions and design drugs able to modulate transcription, investigators need to know something about the makeup of the apparatus that controls reading of the genetic code in human cells. After some 25 years of exploration, the overall structure of that apparatus is becoming clear. Work in my laboratory at the University of California at Berkeley and at other institutions has revealed that one part of the apparatus—the engine driving transcription of most, if not all, human genes— consists of some 50 distinct proteins. These proteins must assemble into a tight complex on DNA before a special enzyme, RNA polymerase, can begin to copy DNA into messenger RNA. The putative constituents have now been combined in the test tube to yield a fully operational transcription engine. Still other proteins essentially plug into receptive sockets on the engine and, in so doing, "program" it, telling it which genes should be transcribed and how quickly. Critical details of these interactions are emerging as well.
Clues from Bacteria
When my colleagues and I at Berkeley began focusing on human genes in the late 1970s, little was known about the transcription machinery in our cells. But studies begun early in that decade had provided a fairly clear picture of transcription in prokaryotes— bacteria and other primitive single-celled organisms that lack a defined nucleus. That work eventually lent insight into human and other eukaryotic (nucleated) cells and helped to define features of transcription that hold true for virtually all organisms. The bacterial research showed that genes are essentially divided into two functionally distinct regions. The coding region specifies the sequence of amino acids that must be linked together to make a particular protein. This sequence is spelled out by the nucleotides (the building blocks of DNA) in one strand of the DNA double helix; the nucleotides are distinguished from one another by the nitrogen-rich base they carry —adenine (A), thymine (T), cytosine (C) or guanine (G). The other region of a gene has regulatory duties. It controls the rate at which RNA polymerase transcribes the coding region of a gene into messenger RNA.