Tuning map and reduce parameters

Picking the right amount of tasks for a job can have a huge impact on Hadoop's performance. In Chapter 4, Identifying Resource Weaknesses, you learned how to configure the number of mappers and reducers correctly. But sizing the number of mappers and reducers correctly is not enough to get the maximum performance of a MapReduce job. The optimum occurs when every machine in the cluster has something to do at any given time when a job is executed. Remember that Hadoop framework has more than 180 parameters and most of them should not keep their default settings.

In this section, we will present other techniques to calculate your mappers' and reducers' numbers. It may be more productive to try more than one optimization ...

Get Optimizing Hadoop for MapReduce now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.