O'Reilly logo

Hadoop Operations and Cluster Management Cookbook by Shumin Guo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Tuning shuffle, merge, and sort parameters

In a MapReduce job, map task outputs are aggregated into JVM buffers. The size of the in-memory buffer determines how large the data can be merged and sorted at once. Too small a buffer size can cause a large number of swap operations, incurring big overhead. In this section, we will show best practices for configuring the shuffle, merge, and sort parameters.

Getting ready

We assume that the Hadoop cluster has been properly configured and all the daemons are running without any issues.

Log in from the Hadoop cluster administrator machine to the cluster master node using the following command:

ssh hduser@master

Note

In this recipe, we assume all the configurations are making changes to the $HADOOP_HOME/conf/mapred-site.xml ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required