O'Reilly logo

The Art of Concurrency by Clay Breshears

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 7. MapReduce

image with no caption

MapReduce is an algorithmic framework, like divide-and-conquer or backtracking, rather than a specific algorithm. The pair of operations, map and reduce, is found in LISP and other functional languages. MapReduce has been getting a lot of buzz as an algorithmic framework that can be executed concurrently. Google has made its fortune on the application of MapReduce within a distributed network of thousands of servers (see “MapReduce: Simplified Data Processing on Large Clusters” in Communications of the ACM [2008] by Jeffrey Dean and Sanjay Ghemawat), which has only served to heighten awareness and exploration of this method.

The idea behind map is to take a collection of data items and associate a value with each item in the collection. That is, to match up the elements of the input data with some relevant value to produce a collection of key-value pairs. The number of results from a map operation should be equal to the number of input data items within the original collection. In terms of concurrency, the operation of pairing up keys and values should be completely independent for each element in the collection.

The reduce operation takes all the pairs resulting from the map operation and does a reduction computation on the collection. As I’ve said before, the purpose of a reduction is to take in a collection of data items and return a value derived from those ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required