
Chapter 9
MapReduce Computation
As the world emerged into an era of Big Data, demand grew for a comput-
ing paradigm that (a) is generally applicable and (b) works on distributed
data. The latter term means that data is physically distributed over many
chunks, possibly on different disks and maybe even different geographical
locations. Having the data stored in a distributed manner facilitates paral-
lel computation — different chunks can be read simultaneously — and also
enables us to work with data sets that are too large to fit into the memory
of a single machine. Demand for such computational capability led to the
development of various systems using ...