Chapter 3. Understanding MapReduce

The previous two chapters have discussed the problems that Hadoop allows us to solve, and gave some hands-on experience of running example MapReduce jobs. With this foundation, we will now go a little deeper.

In this chapter we will be:

  • Understanding how key/value pairs are the basis of Hadoop tasks
  • Learning the various stages of a MapReduce job
  • Examining the workings of the map, reduce, and optional combined stages in detail
  • Looking at the Java API for Hadoop and use it to develop some simple MapReduce jobs
  • Learning about Hadoop input and output

Key/value pairs

Since Chapter 1, What It's All About, we have been talking about operations that process and provide the output in terms of key/value pairs without explaining ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.