But many others do accept the current estimates and are asking what it means that humans should have so few genes. According to Craig Venter, president of Celera Genomics, "the small number of genes, means that there is not a gene for each human trait, that these come at the protein level and at the complex cellular level." As it turns out, at least every third human gene makes several different proteins through "alternative splicing" of its pre-messenger-RNA. Also human proteins have a more complicated architecture than their worm and fly counterparts, adding another level of complexity. And compared with simpler organisms, humans possess extra proteins having functions, for example, in the immune system and the nervous system, and for blood clotting, cell signaling and development.
Scientists are also puzzling over the significance of the discovery that more than 200 genes from bacteria apparently invaded the human genome millions of years ago, becoming permanent additions. Today, the new work shows, some of these bacterial genes have taken over important human functions, such as regulating responses to stress. "This is kind of a shocker and will no doubt inspire some further study," Collins says. Indeed, scientists previously thought that this kind of horizontal gene transfer was not possible in vertebrates.
Another curious feature of the human genome is its overall landscape, in which gene-dense and gene-poor regions alternate. "There are these areas that look like urban areas with skyscrapers of gene sequences packed on top of each other," Collins explains, "and then there are these big deserts where there doesn¿t seem to be anything going on for millions of base pairs." Moreover, such differences are apparent not only within but also between chromosomes. Chromosome 19, for example, is about four times richer in genes than the Y chromosome.
So what¿s going on in gene deserts? More than half the human genome consists of repeat sequences, also known as "junk DNA" because they have no known function. Vertebrates can live well without them: the puffer fish, for example, has a genome with very few of these repeats. In humans, most of them derive from transposable elements, parasitic stretches of DNA that replicate and insert a copy of themselves at another site. But now almost all the different families of transposons seem to have stopped roaming the genome, and only their "fossils" remain. Still, nearly 50 genes appear to originate from transposons, suggesting they played some useful role during the genome¿s evolution.
Image: DOE HUMAN GENOME PROGRAM
HARDLY DONE. Only one billion base pairs (yellow, orange and blues, above), or a third of the total, in the public database are in a "finished" form.
One type of transposon, the so-called Alu element, is found especially often in regions rich in G and C bases. These areas also harbor many genes, and so Alu¿s might somehow be beneficial around them. Overall, the human genome once seemed to be "a complex ecosystem, with all these different elements trying to proliferate," says Robert Waterston, director of the Genome Sequencing Center at the University of Washington, a member of the public consortium. Today the mutations they have accumulated provide an excellent molecular fossil record of the evolutionary history of humankind.
In addition to repeat sequences caused by transposons, large segments of the genome seem to have duplicated over time, both within and between chromosomes. This duplication, researchers say, allowed evolution to play with different genes without destroying their original function and probably led to the expansion of many gene families in humans.
Apart from the genome sequence, both the Human Genome Project and Celera have identified a multitude of base positions in the DNA that differ between individuals and are called single polynucleotide polymorphisms, or SNPs (pronounced "snips"). The public consortium discovered 1.4 million SNPs, and Celera announced it had found 2.1 million of them. Scientists are hoping to learn from them how genes make people different and, in particular, why some are more susceptible to certain diseases than others. "It will certainly take us a long time to figure out what they all mean, if they all mean anything, but I think the process is already beginning," Waterston notes.