Tuning shuffle, merge, and sort parameters
In a MapReduce job, map task outputs are aggregated into JVM buffers. The size of the in-memory buffer determines how large the data can be merged and sorted at once. Too small a buffer size can cause a large number of swap operations, incurring big overhead. In this section, we will show best practices for configuring the shuffle, merge, and sort parameters.
We assume that the Hadoop cluster has been properly configured and all the daemons are running without any issues.
Log in from the Hadoop cluster administrator machine to the cluster master node using the following command:
In this recipe, we assume all the configurations are making changes to the