When DNA Becomes Data

In the past 10-20 years, we have seen a lot of data storage methods come and go, each type bigger and better than the last. As a child of the early 1990s, my first introduction to data storage devices was the floppy disk. These disks stored about 1.4 MB, a pitifully small amount these days. Next came the compact discs (CDs), which allowed for up to 700MB of data storage. These technologies are now used fairly infrequently and have been found to be incompatible with long-term storage for several reasons, including (i) lack of durability (anyone who ever scratched their favorite CD can relate to this), and (ii) their fleeting popularity—most of us would be hard pressed to easily present a computer that could read either format.

Enter DNA, which given its ubiquity in health care as well as nearly all aspects of research, is anticipated to remain read-able in the face of all technological advances. And though we typically think of DNA as existing only in the realm of our bodies or a test tube, and not encoded with digital information, DNA may be making a case for data storage. The idea is fairly simple, at least in theory. DNA strands are comprised of a 4 base “codes”: Adenine (A), Guanine (G), Cytosine (C), and Thymine (T). Digital information is stored as a series of 0s and 1s, but can be converted by translation to DNA. Instead of using the 0s and 1s, information can be encoded in A, G, C, or T. Once the new code has been created labs can write strands of synthetic DNA with the code, and these strands are incredibly miniscule. Each base molecule (e.g., A or G) is only a cubic nanometer, meaning that large amounts of information-rich material can be created without taking up much physical space.

With this ability to store 1 zettabyte (1 billion terabytes) in 1 gram of DNA, there is certainly exciting potential for storage. The longevity of DNA also is incredibly important here. Scientists have been able to extract and sequence genetic information from organisms hundreds of thousands of years old (and even use the resulting sequence to recreate dinosaurs!) which makes it extraordinarily stable.

Yet the power of this storage tool has yet to be fully explored. In 2012 Harvard researchers encoded 70 billion book copies in 1 cubic millimeter of DNA. More recently, Microsoft engineers have written image files (in the kilobyte range) into DNA strands. So while the scale of projects translated into DNA remain on the low side, will this technology be transformed into a viable source of long-term data storage, or is this fleeting?

There is one looming factor that will determine how useful this technology could be: will it ever be cost-efficient to store data using synthetic DNA? How will we make it easily created and read, and most important, accessible? Accessibility is key here, because even the most efficient and safe storage will not gain traction if costs are prohibitive. Microsoft recently partnered with start-up company Twist Bioscience, producer of synthetic DNA pieces, and bought 10 million strands to experiment with genetic storage. Their most recent efforts in July set the current record at 200 megabytes of data encoded in DNA strands. Though for those without Microsoft’s vast resources, even acquiring the tools to encode the information can be problematic. Writing is currently estimated to be about 1 million times too high for scalability, and while costs of gene sequencing have dropped dramatically since the advent of the technology, it still remains close to $1,000 to sequence 1 genome.

But if this works? We could be someday buying DNA drives instead of a disk drive, and that is something to look forward to.