Configuring MapReduce for performance
In this recipe, we will touch upon MapReduce parameters and see how we can optimize them.
Getting ready
For this recipe, you will again need a running cluster with HDFS and YARN. Users must have completed the recipe Configuring YARN for performance recipe.
How to do it...
- Connect to the master node
master1.cyrus.com
and switch to thehadoop
user. - The file where these changes will be made is
mapred-site.xml
. - The first thing to adjust is to sort the buffer according to the HDFS block size. It must always be greater than the value of
dfs.blocksize
. This can be configured as follows:<property> <name>mapreduce.task.io.sort.mb</name> <value>200</value> </property>
- The next value to tune is the number of streams to merge ...
Get Hadoop 2.x Administration Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.