From Popular Mechanics
DNA can store far more data than a magnetic hard drive, but the technology is limited because the genetic material is prone to errors.
Scientists at the University of Texas at Austin have come up with a way to store information in strands of DNA, while also correcting those errors.
To prove it, they’ve put the entirety of The Wizard of Oz—translated into Esperanto— into strands of DNA, with greater accuracy than prior methods.
When the Voyager spacecrafts launched in 1977, ready to study the outer limits of our solar system, they brought with them two golden phonograph records that each contained an assemblage of sounds and images meant to represent life on Earth. But in the future, the perfect next-gen space capsule could be found within our bodies.
That’s because DNA is millions of times more efficient at storing data than your laptop’s magnetic hard drive. Since DNA can store data far more densely than silicon, you could squeeze all of the data in the world inside just a few grams of it.
“Because DNA has been chosen by all of life as the information storage medium of choice…it turns out to be very robust,” Ilya Finkelstein, an associate professor of molecular biosciences at the University of Texas at Austin, tells Popular Mechanics. “Long after our magnetic storage becomes obsolete, nature will still be using DNA.”
Finkelstein is part of a team at the University of Texas at Austin who are pushing the limits on DNA-based storage methods. While this research area at the intersection of molecular biology and computer science has been around since the 1980s, scientists have struggled to find a way to correct the errors that DNA can be so prone to making.
In a new paper published this week in the journal Proceedings of the National Academy of Sciences, Finkelstein and company detail their new error correction method, which they tested out on a classic film. They were able to store the entirety of The Wizard of Oz, translated into Esperanto, with more accuracy than prior DNA storage methods ever could have. We’re on the yellow brick road toward the future of data storage.
A Brief History of DNA Storage
Researchers at the University of Texas at Austin are certainly not the first to have encoded a work of art onto strands of DNA.
Early DNA storage methods actually date back to a 1988 Harvard experiment. Those scientists managed to store an image of one of artist Joe Davis’s pieces in an E. coli DNA sequence. Upon decoding, it made up a 5-by-7 matrix that depicted a heady Ancient rune about life and female-centric Earth.
By 2011, scientists at the European Bioinformatics Institute in the United Kingdom also caught on to the practice. Nick Goldman, a bioinformatics technician, had been commiserating with his colleagues about how they could store the reams upon reams of genome sequences that the world had been producing. It started out as a joke, out of frustration, he told Nature.
“We thought, ‘What’s to stop us using DNA to store information?'” Goldman said. Two years later, the group had managed to successfully encode five files onto strands of DNA, including Martin Luther King Jr.’s famous “I Have a Dream,” speech, and sonnets from Shakespeare.
In November 2016, a spinout company from the Massachusetts Institute of Technology, called Catalog, immortalized the 144 words in Robert Frost’s famous poem, “The Road Not Taken” in strands of DNA. That work represented about one kilobyte’s worth of data.
That same year, a team of researchers from Microsoft and the University of Washington fit 200 megabytes of data onto lengths of DNA, including the entirety of War and Peace. In March 2019, they even came up with the first automated system for storing and retrieving data in the manufactured genetic material.
Today, other major technology firms are also working in the space, including both IBM and Google. The ultra-secretive U.S. Intelligence Advanced Research Projects Activity—the government’s version of DARPA, but for spies—is even invested in the work. These researchers envision a future where some of the most precious, but rarely accessed data, can be stored in vials of DNA, only pulled down from the cool, dark storage of the lab, as needed.
How DNA Storage Works
The magnetic hard drive is one of the most popular methods for storing data in today’s computers. Inside, there is a pair of rotating discs, called platters, that resemble a CD. They store data onto their circular surface in chunks of 1s and 0s, known as binary code. Centered on a spindle, the platter rotates and an electronic current reads and writes data onto the surface. Electronic components power the whole operation.
Similarly, DNA-based storage requires an encoding and decoding scheme. In this case, scientists chemically create synthetic DNA with certain properties, based on the four nucleotides bases—adenine (A), cytosine (C), guanine (G), and thymine (T)—that make up the genetic material’s ladder-like helical shape.
Because there are four building blocks in DNA, rather than the binary 1s and 0s in magnetic hard drives, the genetic storage method is far more dense, explains John Hawkins, another co-author of the new paper. “A teaspoon of DNA contains so much data it would require about 10 Walmart Supercenter-sized data centers to store using current technology,” he tells Popular Mechanics. “Or, as some people like to put it, you could fit the entire internet in a shoe box.”
Not only that, but DNA is future-proof. Hawkins recalls when CDs were the dominant storage method, back in the 1990s, and they held the promise that their storage could last forever, because plastic does (but scratches can be devastating). Data stored on DNA, on the other hand, can last for hundreds of thousands of years. In fact, there is a whole field of science called archaeogenetics that explores the longevity of DNA to understand the ancient past.
Beyond that, DNA requires virtually zero maintenance once it’s stored. After all, fossils preserve DNA sequences after spending millions of years underground. DNA storage doesn’t require any energy, either—just a cool, dark place to hang out until someone decides to access it. But the greatest advantage, Hawkins says, is that our ability to read and write DNA will never become obsolete.
“If I wanted to go read an essay I myself wrote as a child, for example, I would already need to first go to a museum to find a working computer from that era, and I’m only in my 30s,” he explains. “But DNA is uniquely future-proof on this front because we are made of it. As long as humans are made of DNA, we will always want machines around that can read it.”
Getting Past the Errors
But like all data storage methods, DNA has a few shortcomings as well. The most significant upfront hurdle is cost. Hawkins says that current methods are similar to the cost for an Apple Hard Disk 20 back in 1980. Back then, about 20 megabytes of storage—or the amount of data you’d need to use to download a 15-minute video—went for about $1,500.
Beyond that, DNA is also error-prone. Recall the four nucleotide bases that make up the DNA ladder. On average, DNA introduces about one mistake per 100 to 1,000 nucleotides. These can take three forms: substitutions, insertions, and deletions.
In a substitution mutation, a single letter in a string of nucleotides may be switched out for another. In the graphic below, cytosine is replaced with thymine. The strands of DNA remain the same length. In an insertion or deletion, though, the DNA gets an extra nucleotide base, or removes one. But unlike errors in computer code, there is no space left behind where a removed base once lived, which can quickly become problematic when you go to decode the data stored in the DNA.
Hawkins likes to compare this to English words: “A deletion of the letter ‘L’ turns ‘world’ into ‘word.’ Additionally, inserting an ‘S’ then turns it into ‘sword.’ Correctly reading ‘world’ from ‘sword’ is hard not only because sword is still a valid English word, but because all the letters shifted around.”
Other forms of DNA storage got past these replication errors by repeating the code for the data 10 to 15 times over—but that’s a massive waste of space. In the new method described in the team’s research paper, however, they build the data into the DNA in a lattice shape, wherein each bit of data reinforces the next, so that it only needs to be read once.
They also developed an algorithm that overcomes insertion, deletion, and substitution errors all at once, making DNA-based digital data storage far more efficient. It’s why the team could so readily fit “The Wizard of Oz” onto strands of DNA without replicating the combination of A, C, T, and G bases many times over.
A Vision of the Future
Moving forward, the potential for DNA-based storage is nearly limitless. Finkelstein presents a vision of the future wherein DNA, encoded with data, can be incorporated inside other materials.
In one example, he says, researchers impregnated a piece of 3D-printed plastic with strands of DNA that contained the object files for the plastic object being printed. As the plastic passes through the printer, it can release the DNA to recreate the file in a circular process.
Or, you could use DNA-based data storage as a way to make forensic discoveries about inanimate objects that don’t have their own genetic material. Say you coat an airplane with a material that contains DNA, with the full instructions for building that particular portion of the plane. If something goes awry, and the plane ends up in the sea, the DNA contained in the coating will degrade to some degree due to the sun’s ultraviolet rays.
But put another way, that degradation is just a way to record information about what has happened to the plane. If even one piece of the wreckage is recovered, scientists can analyze the stored DNA–and the degradation—to see how long it has been lost at sea.
Even with the breakthroughs that Finkelstein’s team has made, DNA-based digital storage is still some time away. “I think that niche applications are probably close to being on the horizon,” he says, “but I don’t think it’s going to be a mass market product for a decade or more.”
It’s been nearly 60 years since magnetic tape overcame punch cards as the primary mode for data storage, bringing about a revolution in personal computing. Since then, disk drives have only gotten smaller and smaller. So a future where the storage medium of choice is so small that you can hardly even see it actually makes sense.
When we reach that reality, DNA-based storage will be the most impressive leap yet.
You Might Also Like