The number of sequenced human genomes will soon swell to more than 1,000 as part of a new international research consortium's effort to trace the potential genetic origins of disease. But first the mother, father and adult child of a European-ancestry family from Utah and a Yoruba-ancestry family from Nigeria will join an anonymous individual as well as famous geneticists Craig Venter and James Watson as part of the handful of humans to have on record a complete readout of their roughly three billion pairs of DNA. And these six will also each have their genetic codes examined at least 20 times, providing 10 times the accuracy of existing genetic sequences as well as paving the way for the ambitious effort dubbed the 1,000 Genomes Project, which will comprehensively map humanity's genetic variation.

"The reference sequence that we obtained in 2003 [from the anonymous individual] is just a human genome sequence, but there are six billion humans and it is the sequence of all of us that is important," says project co-chair Richard Durbin of the Wellcome Trust Sanger Institute in Cambridge, England. "We can't get that, but the output of the 1,000 Genomes Project will be a lot closer."

The project will proceed in three steps, according to the consortium. The first, currently underway and expected to be completed by year's end, is the detailed scanning of the six individuals. This will be followed by less detailed genome scans of 180 anonymous people from around the world and then partial scans of an additional 1,000 people. "If we look at about 1,000 individuals, we'll get genetic variants in those samples that are somewhere around 1 percent or lower frequency" in the human population, says geneticist Lisa Brooks, director of the Bethesda, Md.–based National Human Genome Research Institute's Genetic Variation Program.

The researchers plan to use common genetic sequences from the initial six individuals to allow less rigorous scanning as the project accelerates. For example, a shared stretch of DNA from one of the detailed individual scans could fill in the blanks for a less rigorously scanned later individual. "This all needs to be tested," Brooks says. "Is doing these scans 2x [twice] sufficient?"

Blood samples for the final project have already been collected from the first six candidates as well as randomly selected populations from throughout the world: Japanese from Tokyo, Chinese from Beijing, Luhya and Masai from Kenya, Toscani from Italy, Gujarati Asian Indians from Houston, Chinese from Denver, Mexican-Americans from Los Angeles and African-Americans from the southwestern U.S. "We wanted to look at a couple of populations, two or three from each of the major Old World continents. That gives you the variety of people but no one population is needed," Brooks notes. "We don't know exactly who these people are."

The researchers collected no identifying individual information, such as medical histories or basic height and weight data, because the study is not looking for specific diseases but rather the range of human genetic variation. "How many individuals with autism would be in 100 samples from a community?" Brooks says. "It's not a very good disease study but this is going to support a zillion disease studies that are properly designed."

For example, once complete, the 1,000-genome resource will allow researchers to pinpoint genes, structural variants in chromosomes and other individual genomic variations that are associated with diseases ranging from autism to cancer. Researchers using the incomplete genetic information presently available have already identified at least 100 regions of the genome associated with various diseases by comparing the DNA variation between healthy and ill subjects. "This project is talking about a hundred times as much data" as existing genetic resources, Durbin says.

It will also shed light on humanity's shared evolutionary history. "We will learn some more about…the period of a few hundred thousand years or so in which modern humans evolved before they spread from Africa," Durbin says. "This can be studied by looking for evidence of selection in the pattern of variation seen in the genome."

It is already clear that 99 percent of DNA is the same in all humans. But by mapping variations in the other 1 percent, the 1,000 Genomes Project may help reveal the genetic underpinnings of some disease. "Once you have those elements fingered, then you can figure out how to do therapies," Brooks says. "It's not going to tell you the causal ones, but it's going to give you the list of suspects."