Reduce tasks can be started when a certain percentage of map tasks has been finished. By setting this property with a smaller number, the reduce tasks will start earlier, occupying the computing slots. On the other hand, if the number is set too large, for example, very close to
1, the reduce tasks will have to wait for the majority of the map tasks to finish, prolonging the job execution time. In this recipe, we will outline steps to configure reducer initialization.
We assume that the Hadoop cluster has been properly configured and all the daemons are running without any issues.
Log in from the Hadoop cluster administrator machine to the cluster master node using the following command: