Enhancing reduce tasks

Reduce task processing consists of a sequence of three phases. Only the execution of the user-defined reduce function is custom, and its duration depends on the amount of data flowing through each phase and the performance of the underlying Hadoop cluster. Profiling each of these phases will help you to identify potential bottlenecks and low speeds of data processing. The following figure shows the three major phases of Reduce tasks:

Enhancing reduce tasks

Let's see each phase in some detail:

  • Profiling the Shuffle phase implies that you need to measure the time taken to transfer the intermediate data from map tasks to the reduce tasks and then merge ...

Get Optimizing Hadoop for MapReduce now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.