Big data could soon be stored in a very small package: DNA. A team of scientists has demonstrated that storing information in synthetic DNA could represent a feasible approach to managing data in the long term, bumping aside the magnetic tape favored by archivists today.
The approach, published online January 23 in Nature, relies on technologies that are likely to become faster and cheaper, says biologist and engineer Drew Endy of Stanford University, who was not involved in the work.
Unlike record players, which are good only for playing music encoded on now-obsolete vinyl discs, machines that make and read DNA find uses throughout science and always will. “Human beings are never going to stop caring about DNA,” says Endy. DNA is also compact, lightweight, and can potentially remain intact for thousands of years if stored in a dark, cool environment.
To illustrate the technique, the research team stored five files — totaling about 750 kilobytes of data — as DNA: all 154 of Shakespeare’s sonnets (a text file), Watson and Crick’s classic 1953 paper describing the structure of DNA (a PDF), a color photograph (a JPEG) and a 26-second excerpt from Martin Luther King’s 1963 “I Have a Dream” speech (an MP3).
This new report comes on the heels of similar research published last August in Science. The new research projects that, if the costs of making DNA continue to drop, the approach might be economical for long-term storage in as little as 10 years.
“It’s genuinely exciting,” Endy says.
Led by Nick Goldman, researchers from the European Bioinformatics Institute in England began by converting the five files into bits (technically, “trits” — they used a triplet code comprising zero, one and two). Then they translated that code into one made of As, Cs, Gs and Ts, the “letters” of DNA. So TAGAT replaces the “T” that begins line two of Shakespeare’s sonnet 18: “Thou art more lovely and more temperate.” The team also incorporated a way to index the data — sort of a DNA version of the Dewey Decimal System — and an error correction code to keep the data clean.
Then the researchers sent their code to the instrumentation company Agilent Technologies in Santa Clara, Calif. There scientists read the code and used it to build millions upon millions of DNA molecules, which they sent back to the researchers via FedEx in a test tube inside a cardboard box.
When the test tube, about the size of a pinkie finger, arrived, Goldman and his colleagues sequenced the DNA, the same way researchers read the DNA of organisms, reconstructing the original files. The translation from data to DNA and back was free of errors, says Goldman.
The approach isn’t likely to replace thumb drives anytime soon. But in the next decade, it could store information that needs to last for at least 50 years, such as government records or library texts. And who knows where it will go, wonders Goldman. Perhaps, he says, “when the cloud sucks things off your computer, it will be to store it as DNA.”
Back story | Data in DNA
As the technology for deciphering and synthesizing DNA has surged forward, so too have techniques for storing data. In just over a decade, small-scale pilot experiments have given way to the development of methods that may make DNA data storage economically competitive for the archival storage of large amounts of information.
Artist Eduardo Kac adapts a line from Genesis (“Let man have dominion over the fish in the sea, and over the fowl of the air, and over every living thing that moves upon the Earth”) and translates it into Morse code, then into a sequence of DNA base pairs. He then inserts the genetic material into bacteria, which produce a protein based on the sequence.
In a different project, researchers encode text in DNA and then mix it into a full human genome that is then printed onto paper as a microdot, a hidden message embedded in a period. Though the technique was secure at the time, modern DNA sequencing technology has made the message-containing DNA fairly easy to isolate from the rest of the genetic material.
Researchers encoded the text and music of the nursery rhyme “Mary Had a Little Lamb,” along with a crude sketch of a lamb, in 844 base pairs of DNA.
Researchers at the J. Craig Venter Institute insert a synthetic genome modeled on one bacterium into a different species, and induce the hybrid to reproduce. Before inserting the DNA, they add encoded messages including the names of the project’s 46 researchers and quotes from James Joyce and physicist Richard Feynman.
George Church of Harvard Medical School encodes an HTML version of his book Regenesis in DNA, and then converts it back into digital form.
Nick Goldman and his colleagues convert image, audio and text files into DNA using a technology that could eventually make the molecule competitive as an archival storage medium.