Geneticists call for better draft sequences

Proposed rankings would classify genomes by completeness and quality.

Join Our Community of Science Lovers!

By Elie Dolgin

Researchers who have mapped a species' genome need to be more explicit about the quality of their sequence, says an international team of genome researchers.

"People generating these sequences should discriminate a bit more between the products that they provide to the rest of the scientific community," says Patrick Chain of the Joint Genome Institute at the Los Alamos National Laboratory in New Mexico who is first author of a policy paper on genomic standards published this week in Science.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

The increased speed and reduced cost of genome sequencing has led to a profusion of genomes published under the catch-all title of 'draft sequence'. These genomes can range from relatively incomplete, poor-quality sequences to well-annotated gene maps with few sequence errors or alignment mistakes. Yet often, the only way to discern the quality of a sequence is to painstakingly navigate the genetic blueprint base by base, genome fragment by genome fragment.

Shared language

To remedy the situation, Chain and his colleagues from more than ten sequencing centres across the United States, Canada and the United Kingdom suggest in their paper that a new set of standards needs to be applied to genome projects. The researchers assessed the quality of existing technologies, computational tools and current gene maps to devise six broad categories that span the scope of genomic grades.

The six rankings range from 'standard draft' genomes -- inexpensive and incomplete first-pass sequences that provide the minimum benchmark for submission to public databases -- to complete, near-flawless, 'gold-standard' finished genomes. Better-quality drafts and near-finished sequences fall on a sliding scale in between.

Only bacterial and archaeal genomes currently meet the 'finished' criteria, says study co-author Darren Grafham, of the Wellcome Trust Sanger Institute near Cambridge, UK. The human genome sequence scores one rung down the ladder, being labelled with the 'noncontiguous finished' title, Grafham says.

"It's important that there's a shared language so people can all know that they're talking about the same thing," says Chad Nusbaum, co-director of the genome sequencing and analysis programme at the Broad Institute in Cambridge, Massachusetts, who was not involved in the study. "By establishing generalizable, platform-independent, universal, translatable standards, people know what they're dealing with."

Square pegs, round holes

But the categories only reliably apply to small, microbial genomes, says Harris Lewin, director of the Institute for Genomic Biology at the University of Illinois at Urbana-Champaign. He notes that the authors do not make reference to physical maps -- chromosome layouts that show the specific physical locations of DNA landmarks -- which are crucial for capturing the organization and structure of large eukaryotic genomes. "There still need to be some improvements," says Lewin, who discusses the importance of physical maps in a commentary published online in July by the journal Genome Research.

Svante Pääbo, a geneticist who studies Neanderthal genomics at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, says that ancient genome sequences also don't fit neatly into the authors' defined categories. With long-dead organisms, sequence quality is usually dictated by amounts of available DNA and the degree of DNA degradation and contamination. "Rather than forcing ancient genome sequences into categories like the ones suggested here," Pääbo says, "I think that the patterns of sequence errors, extent of contamination and so on need to be estimated and described in each case."

Robin Buell, a plant genomicist at Michigan State University, East Lansing, agrees with the 'standard draft' and 'finished' definitions, but says that most of the in-between categories are not that helpful because the delineations between levels are vague and overlapping. "People aren't going to be able to clearly place their products into those bins," she says.

But Grafham counters that the categories will provide useful reference points for the future, as next-generation sequencing machines start to dominate the genomics landscape. "Although that vocabulary should be universal, the specifics are still up for discussion," he says. "We don't have enough knowledge about the new platforms to be that prescriptive currently."

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American