Hadoop
Before we jump to loading data, a quick overview of MapReduce is warranted. Although Druid comes prepackaged with a convenient MapReduce job to accommodate historical data, generally speaking, large distributed systems will need custom jobs to perform analyses over the entire data set.
An overview of MapReduce
MapReduce is a framework that breaks processing into two phases: a map phase and a reduce phase. In the map phase, a function is applied to the entire set of input data, one element at a time. Each application of the map
function results in a set of tuples, each containing a key and a value. Tuples with similar keys are then combined via the reduce
function. The reduce
function emits another set of tuples, typically by combining the ...
Get Storm Blueprints: Patterns for Distributed Real-time Computation now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.