O'Reilly logo

Storm Blueprints: Patterns for Distributed Real-time Computation by Brian O'Neill, P. Taylor Goetz

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Hadoop

Before we jump to loading data, a quick overview of MapReduce is warranted. Although Druid comes prepackaged with a convenient MapReduce job to accommodate historical data, generally speaking, large distributed systems will need custom jobs to perform analyses over the entire data set.

An overview of MapReduce

MapReduce is a framework that breaks processing into two phases: a map phase and a reduce phase. In the map phase, a function is applied to the entire set of input data, one element at a time. Each application of the map function results in a set of tuples, each containing a key and a value. Tuples with similar keys are then combined via the reduce function. The reduce function emits another set of tuples, typically by combining the ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required