8. Putting It Together: MapReduce Data Pipelines

It’s kind of fun to do the impossible.

—Walt Disney

Human brains aren’t very good at keeping track of millions of separate data points, but we know that there is lots of data out there, just waiting to be collected, analyzed, and visualized. To cope with the complexity, we create metaphors to wrap our heads around the problem. Do we need to store millions of records until we figure out what to do with them? Let’s file them away in a data warehouse. Do we need to analyze a billion data points? Let’s crunch it down into something more manageable.

No longer should we be satisfied with just storing data and chipping away little bits of it to study. Now that distributed computational tools are becoming ...

Get Data Just Right: Introduction to Large-Scale Data & Analytics now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.