By Elie Dolgin
Researchers who have mapped a species' genome need to be more explicit about the quality of their sequence, says an international team of genome researchers.
"People generating these sequences should discriminate a bit more between the products that they provide to the rest of the scientific community," says Patrick Chain of the Joint Genome Institute at the Los Alamos National Laboratory in New Mexico who is first author of a policy paper on genomic standards published this week in Science.
The increased speed and reduced cost of genome sequencing has led to a profusion of genomes published under the catch-all title of 'draft sequence'. These genomes can range from relatively incomplete, poor-quality sequences to well-annotated gene maps with few sequence errors or alignment mistakes. Yet often, the only way to discern the quality of a sequence is to painstakingly navigate the genetic blueprint base by base, genome fragment by genome fragment.
To remedy the situation, Chain and his colleagues from more than ten sequencing centres across the United States, Canada and the United Kingdom suggest in their paper that a new set of standards needs to be applied to genome projects. The researchers assessed the quality of existing technologies, computational tools and current gene maps to devise six broad categories that span the scope of genomic grades.
The six rankings range from 'standard draft' genomes -- inexpensive and incomplete first-pass sequences that provide the minimum benchmark for submission to public databases -- to complete, near-flawless, 'gold-standard' finished genomes. Better-quality drafts and near-finished sequences fall on a sliding scale in between.
Only bacterial and archaeal genomes currently meet the 'finished' criteria, says study co-author Darren Grafham, of the Wellcome Trust Sanger Institute near Cambridge, UK. The human genome sequence scores one rung down the ladder, being labelled with the 'noncontiguous finished' title, Grafham says.
"It's important that there's a shared language so people can all know that they're talking about the same thing," says Chad Nusbaum, co-director of the genome sequencing and analysis programme at the Broad Institute in Cambridge, Massachusetts, who was not involved in the study. "By establishing generalizable, platform-independent, universal, translatable standards, people know what they're dealing with."
Square pegs, round holes
But the categories only reliably apply to small, microbial genomes, says Harris Lewin, director of the Institute for Genomic Biology at the University of Illinois at Urbana-Champaign. He notes that the authors do not make reference to physical maps -- chromosome layouts that show the specific physical locations of DNA landmarks -- which are crucial for capturing the organization and structure of large eukaryotic genomes. "There still need to be some improvements," says Lewin, who discusses the importance of physical maps in a commentary published online in July by the journal Genome Research.
Svante Pääbo, a geneticist who studies Neanderthal genomics at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, says that ancient genome sequences also don't fit neatly into the authors' defined categories. With long-dead organisms, sequence quality is usually dictated by amounts of available DNA and the degree of DNA degradation and contamination. "Rather than forcing ancient genome sequences into categories like the ones suggested here," Pääbo says, "I think that the patterns of sequence errors, extent of contamination and so on need to be estimated and described in each case."
Robin Buell, a plant genomicist at Michigan State University, East Lansing, agrees with the 'standard draft' and 'finished' definitions, but says that most of the in-between categories are not that helpful because the delineations between levels are vague and overlapping. "People aren't going to be able to clearly place their products into those bins," she says.
But Grafham counters that the categories will provide useful reference points for the future, as next-generation sequencing machines start to dominate the genomics landscape. "Although that vocabulary should be universal, the specifics are still up for discussion," he says. "We don't have enough knowledge about the new platforms to be that prescriptive currently."