Chapter 5. Enhancing Map and Reduce Tasks

The Hadoop framework already includes several counters such as the number of bytes read and written. These counters are very helpful to learn about the framework activities and the resources used. These counters are sent by the worker nodes to the master nodes periodically.

In this chapter, for both map and reduce, we will learn how to enhance each phase, what counters to look at, and the techniques to apply in order to analyze a performance issue. Then, you will learn how to tune the correct configuration parameter with the appropriate value.

In this chapter, we will cover the following topics:

  • The impact of the block size and input data
  • How to deal with small and unsplittable files
  • Reducing map-side spilling ...

Get Optimizing Hadoop for MapReduce now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.