A 2,000-year-old math theorem, along with Sudoku, may soon help researchers untangle DNA at blazing speeds.
Hunting for a particular genetic mutation in hundreds of thousands of specimens can be an expensive and time-consuming process. In the past several years, faster multiplex DNA sequencing machines have sped up the acquisition of data, but researchers have still been hobbled by having to label each sample with a unique molecular identifier (or bar code) for analysis.
Scientists at Cold Spring Harbor Laboratory (CSHL) in Long Island, N.Y., are proposing a new take on a very old idea to tackle large data sets simultaneously. The team is applying the Chinese remainder theorem to pinpoint single samples in larger pools, which are arranged in rows and columns.
Invented about 2,000 years ago, the theorem is a method for mapping information using prime and co-prime numbers. In the case of DNA sequencing and Sudoku, the theorem is used to organize data points with coordinates in a box, but it can also be used to figure out all sorts of missing information in other domains, such as distant points sensed with high-speed radar, pieces of code, and who that attractive person was that you saw at three out of seven parties on a cruise ship.
By using the idea, researchers can deal with whole libraries of genetic information instead of looking at just "one genetic sequence at a time," says Yaniv Erlich, the lead author of the paper, published as the cover story of this month's Genome Research.
In Sudoku players must fill every row and column each with all nine numerals, but in applying this to so many genetic samples to search, the researchers call on state-of-the-art robots, machines and programs to do the specimen placing and searching for them. "Every cell in a Sudoku [puzzle] is like a specimen, and every digit is like a genotype," says Erlich, a doctoral student who had used the Chinese remainder theorem in previous work with radar. He brought the idea to the attention of his CSHL professor Greg Hannon.