Chapter 2. MapReduce

In this chapter, we’re going to build on what we learned about HDFS and the map-only portion of MapReduce and introduce a full MapReduce job and its mechanics. This time, we’ll include both the shuffle/sort phase and the reduce phase. Once again, we begin with a physical metaphor in the form of a story. After that, we’ll walk you through building our first full-blown MapReduce job in Python. At the end of this chapter, you should have an intuitive understanding of how MapReduce works, including its map, shuffle/sort, and reduce phases.

First, we begin with a metaphoric story…about how Chimpanzee and Elephant saved Christmas.

Chimpanzee and Elephant Save Christmas

It was holiday time at the North Pole, and letters from little boys and little girls all over the world flooded in as they always do. But this year, the world had grown just a bit too much. The elves just could not keep up with the scale of requests—Christmas was in danger! Luckily, their friends at Chimpanzee and Elephant, Inc., were available to help. Packing their typewriters and good winter coats, JT, Nanette, and the crew headed to the Santaplex at the North Pole. Here’s what they found.

Trouble in Toyland

As you know, each year children from every corner of the earth write to Santa to request toys, and Santa—knowing who’s been naughty and who’s been nice—strives to meet the wishes of every good little boy and girl who writes him. He employs a regular army of toymaker elves, ...

Get Big Data for Chimps now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.