Chapter 15. Life in Data: The Story of DNA

Matt Wood

Ben Blackburne

DNA IS A BIOLOGICAL BUILDING BLOCK, A CONCISE, SCHEMA-LESS, FAULT-TOLERANT DATABASE OF AN organism's chemical makeup, designed and implemented by a population over millions of years. Over the past 20 years, biologists have begun to move from the study of individual genes to whole genomes, with genomic approaches forming an increasingly large part of modern biomedical research. In recent years, however, biologists have been learning to handle DNA as both a data store and a data source.

There are two stories to tell about DNA pertinent to this book. DNA itself is a method of encoding data, a digital store of information that predates your hard drive by quite some time. But there is a second, interlinked story, that of the massive undertaking of producing this data and determining its meaning.

DNA As a Data Store

A genome is the database for an organism. It is written in the molecules of DNA, copies of which are stored in each cell of the human body (with a few exceptions). This pattern is repeated across nature, right down to the simplest forms of life. The information encoded within the genome contains the directions to build the proteins that make up the molecular machinery that runs the chemistry of the cell. Now that's what I call fault-tolerant and redundant storage.

Almost every cell in your body contains a central data center, which stores these genomic databases, called the nucleus. Within this are the chromosomes. ...

Get Beautiful Data now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.