Chapter 5. Enhancing Map and Reduce Tasks

The Hadoop framework already includes several counters such as the number of bytes read and written. These counters are very helpful to learn about the framework activities and the resources used. These counters are sent by the worker nodes to the master nodes periodically.

In this chapter, for both map and reduce, we will learn how to enhance each phase, what counters to look at, and the techniques to apply in order to analyze a performance issue. Then, you will learn how to tune the correct configuration parameter with the appropriate value.

In this chapter, we will cover the following topics:

  • The impact of the block size and input data
  • How to deal with small and unsplittable files
  • Reducing map-side spilling ...

Get Optimizing Hadoop for MapReduce now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.