MapReduce is an algorithmic approach to deal with big data. It provides a way of taking large
amount of data, breaking it up into several smaller chunks of data and then processing each of
those chunks in parallel, and nally aggregating the outcome from each process to produce a
single unied outcome.
MapReduce is a two-stage process. The first step is called the ‘Map’ step and the second is
called the ‘Reduce’ step. The ‘Map’ step concerns itself with breaking up the data into chunks and
processing each of those chunks. The ‘Reduce’ step takes the output of the ‘Map’ step and aggre-
gates the outcomes ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month, and much more.