Genetic mutations that enhance disease resistance or boost fitness in a particular climate have been positively selected over the course of human evolution. But current statistical methods to find these beneficial mutations, or variants, have only been able to home in on areas spanning several genes, which may cover a variety of other functions, as well.

And within these broad swaths of the human genome, there are a number of nonselected, or neutral, variants that also get preferentially inherited because they are on the same chromosome as the selected variant. "It's basically that a whole haystack of [mutations] gets pulled up when selection occurs and then you're trying to find which one was the driver—the needle," says Pardis Sabeti, an evolutionary geneticist at Harvard University's Department of Organismic and Evolutionary Biology.

To narrow down the culprits for positive selection, Sabeti and her team of researchers developed an approach that combines statistical methods that differ in their specificity for selected variants into one powerful tool. Using the composite tool, the group analyzed mutations in regions from different chromosomes spanning hundreds of kilobases, or hundreds of thousands of nucleotides. The mutations occurred both inside of genes or in the parts of the genome that do not encode genes. Although the team optimized their method for selection that has taken place in the past 30,000 years, Sabeti says that with some tweaking, the approach could stretch back to the point when human populations began migrating out of Africa and diverging from each other 50,000 to 70,000 years ago. The first study using the team's composite method was published January 7 in Science.

Using this technique, the scientists could predict an area of the genome as narrow as a single gene, rather than several, that had likely been positively selected. For example, they found a single gene involved in eye color or skin pigmentation that was likely to be selected for from a region containing five genes.

"This approach allows you to much more precisely say what types of genes have been selected, and I think that's pretty powerful," says Joshua Akey, an evolutionary biologist at the University of Washington in Seattle, who was not involved in the study by Sabeti's group.

Prior to creating the composite method, different groups of evolutionary geneticists had been using methods that detected one of three different genetic patterns created by positive selection. Even though there had been murmurs in the field about combining different methods to try to improve the signal-to-noise (or needle-to-haystack) ratio, Sabeti says that nobody had actually tested this possibility. According to Akey, "What they did that is clever is say, 'We can use slightly different information that's captured by these different tests.'"

The first test, which Sabeti developed herself, excels at finding large regions of the genome that have undergone positive selection. Because of the regular rate at which sister chromosomes exchange genetic information with one another in a genomic region, it is possible to estimate that region's age compared with its ancestral sequence.

When a region obtains a selected variant, however, that variant generally gets spread rapidly throughout the human population, bringing the neighboring neutral variants on the chromosome along for the ride. This rapid spread is due to the fact that the selected variant makes a person more likely than those without it to reproduce and pass on the beneficial mutation to the next generation. When researchers identify an island of variants that is shared among people it signals that one or some of those variants confer an advantage.

Because this method leaves scientists with regions that could contain up to a million nucleotides and dozens of genes, Sabeti's group incorporated other methods that locate more fine-tuned signatures of positive selection. The researchers optimized a method that examines individual mutations within these islands, trying to determine the variants that are at high prevalence. In contrast to scanning for islands, this procedure's strength is its ability to score each variant for the likelihood that it is selected rather than neutral.

The third strategy that the scientists incorporated into their composite approach takes into account the demographic backgrounds of the genomes under analysis. Because groups of people living in different environments are faced with different selective pressures, variants responsible for allowing a selective advantage might reveal themselves when different populations are compared.

To test the power of the composite method, Sabeti's team subjected 178 genomic regions to the new approach. The sequence came from the HapMap 2 project, an international effort to sequence regions of the genomes from individuals of European, east Asian and west African descent, focusing on nucleotides that vary between these groups. So far, three million nucleotides from 270 people have been sequenced through this project.

The composite approach achieved resolution 100-fold better than any of the tests that it uses. In other words, whereas a single test could predict that there was a selected variant somewhere within a region of 1,000 total variants, the combined strategy narrowed down the likelihood to a region of 10 variants.

Sharon Grossman, who conducted the research along with Sabeti and other researchers, says she was surprised that the approach is so much more powerful than single tests. "A lot of people had assumed that it wouldn't work because they thought that if you had a high score by one [test] that you'd have a high score by all of them," she adds. Although the different tests could all give a high score to a selected variant, they differed in their abilities to give a neutral variant a low score. As a result, Grossman says that the new method has a low false positive rate.

In addition to identifying a pigmentation gene that shows signs of positive selection, the researchers found that a gene associated with hearing and visual perception is more frequent in the east Asian descendants they studied.

But the gene about which Sabeti is most excited, called large, has been implicated in the resistance of a subpopulation of Africans to the Lassa fever virus. Scientists have found that certain versions of the so-called large gene could protect the Nigerians who have it from an infection that would otherwise be as deadly as Ebola. With her new method to home in on variations within a gene, Sabeti is eager to try to pinpoint which mutations of large confer protection against Lassa fever.

Another finding revealed through the composite approach is that about half of all the predicted selected variants are not in regions of the gene that code for proteins. "Lately in genome-wide association studies, there's been this overhaul where many of the most important functional changes are regulatory, or in regions of the gene that are not protein coding," Sabeti says.

The potential to identify selected variants, along with the traits they encode or the genes they regulate, could soon explode, as geneticists gather sequence information from more individuals. Whereas Sabeti's study analyzed only three million nucleotides from 270 individuals, efforts are underway, through the 1000 Genomes Project, to sequence the entire genomes of 1,000 people.

James Evans, a geneticist at University of North Carolina at Chapel Hill who was not involved in the current study, says that, with next-generation sequencing technology, obtaining data is no longer the obstacle. "We soon will have an avalanche of sequence data from humans all across the globe, and analysis has become the bottleneck. That is why this kind of study is so important and so timely," he says.