8. Putting It Together: MapReduce Data Pipelines

It’s kind of fun to do the impossible.

—Walt Disney

Human brains aren’t very good at keeping track of millions of separate data points, but we know that there is lots of data out there, just waiting to be collected, analyzed, and visualized. To cope with the complexity, we create metaphors to wrap our heads around the problem. Do we need to store millions of records until we figure out what to do with them? Let’s file them away in a data warehouse. Do we need to analyze a billion data points? Let’s crunch it down into something more manageable.

No longer should we be satisfied with just storing data and chipping away little bits of it to study. Now that distributed computational tools are becoming ...

Get Data Just Right: Introduction to Large-Scale Data & Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.