When all or part of the map tasks finish, map outputs will be copied from the map task nodes to the reduce task nodes. The parallel copying strategy is used to increase the transfer throughput. By tuning this property, we can boost the performance of our Hadoop cluster. In this recipe, we will outline steps to configure the number of multiple copies for transferring map outputs to reducers.
We assume that the Hadoop cluster has been properly configured and all the daemons are running without any issues.
Log in from the Hadoop cluster administrator machine to the cluster master node using the following command:
In this recipe, we assume all the configurations are making changes ...