O'Reilly logo

Hadoop Operations and Cluster Management Cookbook by Shumin Guo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Setting proper number of parallel copies

When all or part of the map tasks finish, map outputs will be copied from the map task nodes to the reduce task nodes. The parallel copying strategy is used to increase the transfer throughput. By tuning this property, we can boost the performance of our Hadoop cluster. In this recipe, we will outline steps to configure the number of multiple copies for transferring map outputs to reducers.

Getting ready

We assume that the Hadoop cluster has been properly configured and all the daemons are running without any issues.

Log in from the Hadoop cluster administrator machine to the cluster master node using the following command:

ssh hduser@master

Note

In this recipe, we assume all the configurations are making changes ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required