Chapter 6. Deep Learning for Genomics

At the heart of every living organism is its genome: the molecules of DNA containing all the instructions to make the organism’s working parts. If a cell is a computer, then its genome sequence is the software it executes. And if DNA can be seen as software, information meant to be processed by a computer, surely we can use our own computers to analyze that information and understand how it functions?

But of course, DNA is not just an abstract storage medium. It is a physical molecule that behaves in complicated ways. It also interacts with thousands of other molecules, all of which play important roles in maintaining, copying, directing, and carrying out the instructions contained in the DNA. The genome is a huge and complex machine made up of thousands of parts. We still have only a poor understanding of how most of those parts work, to say nothing of how they all come together as a working whole.

This brings us to the twin fields of genetics and genomics. Genetics treats DNA as abstract information. It looks at patterns of inheritance, or seeks correlations across populations, to discover the connections between DNA sequences and physical traits. Genomics, on the other hand, views the genome as a physical machine. It tries to understand the pieces that make up that machine and the ways they work together. The two approaches are complementary, and deep learning can be a powerful tool for both of them.

DNA, RNA, and Proteins

Even if you ...

Get Deep Learning for the Life Sciences now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.