© Thomas W. Dinsmore 2016

Thomas W. Dinsmore, Disruptive Analytics, 10.1007/978-1-4842-1311-7_4

4. The Hadoop Ecosystem

Disrupting from Below

Thomas W. Dinsmore

(1)Newton, Massachusetts, USA

In 2003, Doug Cutting and Mike Cafarella struggled to build a web crawler to search and index the entire Internet. They needed a way to distribute the data over multiple machines, because there was too much data for a single machine.

To keep costs low, they wanted to use inexpensive commodity hardware. That meant they would need fault-tolerant software, so if any one machine failed, the system could continue to operate.

Early in their work, they ruled out using a relational database. Their data included diverse data structures and data types, without a predefined ...

Get Disruptive Analytics: Charting Your Strategy for Next-Generation Business Analytics now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.