Chapter 7. Large-Scale MapReduce

In this chapter, we will consider how to write MapReduce jobs, how to design a large-scale MapReduce using HBase, how the internals of it work, and how to optimize the HBase framework to do it. In doing so, we will discuss the following:

  • MapReduce frameworks
  • When to use MapReduce and when not to
  • Case study with example code and explanations


HBase provides various ways to leverage the potential of MapReduce based on the stack and the architecture you are going to use.

Before we start, let's do a quick revisit to the components, which will be used in MapReduce:

  • Record reader
  • Mapper
  • Combiner
  • Practitioner
  • Shuffle and sort
  • Reduce
  • Output format
  • Record reader: The core responsibility of a record reader is to analyze the ...

Get HBase High Performance Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.