Chapter 3. Understanding MapReduce

The previous two chapters have discussed the problems that Hadoop allows us to solve, and gave some hands-on experience of running example MapReduce jobs. With this foundation, we will now go a little deeper.

In this chapter we will be:

  • Understanding how key/value pairs are the basis of Hadoop tasks
  • Learning the various stages of a MapReduce job
  • Examining the workings of the map, reduce, and optional combined stages in detail
  • Looking at the Java API for Hadoop and use it to develop some simple MapReduce jobs
  • Learning about Hadoop input and output

Key/value pairs

Since Chapter 1, What It's All About, we have been talking about operations that process and provide the output in terms of key/value pairs without explaining ...

