-
The Best Science Writing Online 2012
Showcasing more than fifty of the most provocative, original, and significant online essays from 2011, The Best Science Writing Online 2012 will change the way...
Read More »
From Nature magazine
A trio of researchers has encoded a draft of a whole book into DNA. The 5.27-megabit tome contains 53,246 words, 11 JPG image files and a JavaScript program, making it the largest piece of non-biological data ever stored in this way.
DNA has the potential to store huge amounts of information. In theory, two bits of data can be incorporated per nucleotide — the single base unit of a DNA string — so each gram of the double-stranded molecule could store 455 exabytes of data (1 exabyte is 1018 bytes). Such dense packing outstrips inorganic data-storage devices such as flash memory, hard disks or even storage based on quantum-computing methods.
The book, which is fittingly a treatise on synthetic biology, was encoded by geneticists George Church and Sriram Kosuri at the Wyss Institute for Biologically Inspired Engineering in Boston, Massachusetts, and Yuan Gao, a biomedical engineer at Johns Hopkins University in Baltimore, Maryland. They report their work in Science1 this week.
It marks a significant gain on previous projects — the largest of which encoded less than one-six-hundredth of the data — but organic flash drives are still many years away. There are a number of reasons why the method is not practical for everyday use. For example, both storing and retrieving information currently require several days of lab work, spent either synthesizing DNA from scratch or sequencing it to read the data.
The work illustrates the potential of nonconventional approaches, says Stuart Parkin, who is developing dense forms of inorganic storage media at the IBM-Stanford Spintronic Science and Applications Center in San Jose, California. "You could say that the physical sciences have exhausted the playground of concepts, and we now need to go beyond that world," he says. “This coupling of the biological world to the physical world will lead to some very interesting storage devices in the next decade."
Short and sweet
Encoding the DNA book didn't involve fundamentally new technology so much as the creative application of existing techniques, explains Anne Condon, a computer scientist at the University of British Columbia in Vancouver, who studies how DNA molecules can be used in computing.
Previous attempts to store information in DNA have been held up by difficulties in making perfect long strands. Shorter molecules present less of a challenge, so Church and his colleagues kept their storage strands a mere 159 nucleotides long, and generated multiple copies of each to make catching and correcting mutations easier.
In each single strand, 96 nucleotides represented the encoded data as digital ones and zeroes; 19 nucleotides showed how these data blocks should be ordered; and 44 nucleotides enabled easier sequencing. The researchers' binary code assigned 'zero' to two types of nucleotide (As and Cs) and 'one' to the other two types (Gs and Ts).
“It's using some simple ideas in very elegant ways to improve the density of information that one can store,” says Condon. She says that the technology will work best for specialized applications in which data need to be stored for a long time without being read.
The ideal storage period might be as long as centuries, says Kosuri. Even as other storage technologies become as obsolete as magnetic tape and floppy disks are now, researchers will always be trying to improve technology for reading and writing DNA, because the molecule is so central to biology.
And that will bring costs down over time, enabling DNA data storage to move beyond the realm of demonstration projects. The cost of sequencing technologies has already fallen to about a thousandth of what it was four years ago, says Kosuri, and DNA synthesis has achieved the same drops over the past eight years. "The DNA chip we used for this paper held 55,000 oligonucleotides," he adds. "The newest ones hold a million."
This article is reproduced with permission from the magazine Nature. The article was first published on August 16, 2012.





See what we're tweeting about




9 Comments
Add CommentAmazing job ...
Reply | Report Abuse | Link to thisChurch's group observes that as you scaled up, the barcodes identifying each oligonucleotide would eventually become unrealistically large. So they proposed a hypothetical storage device that splits one and a half /exabytes/ of data into one-petabyte groups, each of which could then be stored in a standard 1536-well microplate (5 by 3 by 0.5 inches).
Reply | Report Abuse | Link to thisThese 216-base oligos could be ordered today from a synthesis service. But commercial technology would have to double in capacity 26 times before a service could write this hypothetical plate in a single work order.
----
The SciAm article doesn't mention why they included the Javacript. It was a mouse-tracking program intended to represent a piece of malware- a reminder that data security issues would still be around in a world of DNA data archives.
They also mention that synthesis companies currently watch out for work orders that request the DNA for dangerous agents. They ought to continue to watch out even on jobs billed as "data archiving", since a maliciously-designed dataset could cause the coding scheme to output the DNA for an infectious/toxic agent.
----
The error rate is worth mentioning too. There were 7 incorrect but validly-coded bits out of the 5.27 million. Not quite perfect. But almost all the errors were in the last fifteen bases, where there was only one copy of each bit rather than the hundred copies for the rest of the sequence. Perhaps a little padding at the ends of the dataset could avoid that problem?
1 exabyte is not 1018 bytes, but 10 to the 18 bytes.
Reply | Report Abuse | Link to thisС помощью изменения-модифицирования кодированной ДНК информации будет создана новая цивилизация генетически модифицированных людей,что в настоящее время уже происходит.Известно,что спортсменов модифицируют но вероятно это только небольшая часть информации.Дальнейшее изменение генетической информации (с благими намерениями)изменит геном настолько,что по генетическому материалу генетически модифицированного человека нельзя будет определить как человека.
Reply | Report Abuse | Link to thisi thought 2 bits of data per nucleotide was/is quantum
Reply | Report Abuse | Link to thisif i don't understand correctly, please ? your quantum reality vs quantum computing theory.
Reply | Report Abuse | Link to thisHow accurate is the data stored via DNA? Is it easy to reproduce without much errors, or is the reproduction of this data error prone?
Reply | Report Abuse | Link to thisif i understand you correctly 'as a person is no longer identified as a hominid?
Reply | Report Abuse | Link to thisif i understand you correctly 'as a person is no longer identified as a hominid?
Reply | Report Abuse | Link to this