Skip to Content
Hadoop: The Definitive Guide, 4th Edition
book

Hadoop: The Definitive Guide, 4th Edition

by Tom White
April 2015
Beginner to intermediate
754 pages
21h 39m
English
O'Reilly Media, Inc.
Content preview from Hadoop: The Definitive Guide, 4th Edition

Chapter 23. Biological Data Science: Saving Lives with Software

Matt Massie

It’s hard to believe a decade has passed since the MapReduce paper appeared at OSDI’04. It’s also hard to overstate the impact that paper had on the tech industry; the MapReduce paradigm opened distributed programming to nonexperts and enabled large-scale data processing on clusters built using commodity hardware. The open source community responded by creating open source MapReduce-based systems, like Apache Hadoop and Spark, that enabled data scientists and engineers to formulate and solve problems at a scale unimagined before.

While the tech industry was being transformed by MapReduce-based systems, biology was experiencing its own metamorphosis driven by second-generation (or “next-generation”) sequencing technology; see Figure 23-1. Sequencing machines are scientific instruments that read the chemical “letters” (A, C, T, and G) that make up your genome: your complete set of genetic material. To have your genome sequenced when the MapReduce paper was published cost about $20 million and took many months to complete; today, it costs just a few thousand dollars and takes only a few days. While the first human genome took decades to create, in 2014 alone an estimated 228,000 genomes were sequenced worldwide.[152] This estimate implies around 20 petabytes (PB) of sequencing data were generated in 2014 worldwide.

Figure 23-1. Timeline of big data technology and cost of sequencing a genome

The plummeting cost ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Hadoop: The Definitive Guide, 3rd Edition

Hadoop: The Definitive Guide, 3rd Edition

Tom White
Kafka: The Definitive Guide

Kafka: The Definitive Guide

Neha Narkhede, Gwen Shapira, Todd Palino
Kafka: The Definitive Guide, 2nd Edition

Kafka: The Definitive Guide, 2nd Edition

Gwen Shapira, Todd Palino, Rajini Sivaram, Krit Petty

Publisher Resources

ISBN: 9781491901687Errata PageSupplemental Content