O'Reilly logo

Hadoop 2.x Administration Cookbook by Gurmukh Singh

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Configuring MapReduce for performance

In this recipe, we will touch upon MapReduce parameters and see how we can optimize them.

Getting ready

For this recipe, you will again need a running cluster with HDFS and YARN. Users must have completed the recipe Configuring YARN for performance recipe.

How to do it...

  1. Connect to the master node master1.cyrus.com and switch to the hadoop user.
  2. The file where these changes will be made is mapred-site.xml.
  3. The first thing to adjust is to sort the buffer according to the HDFS block size. It must always be greater than the value of dfs.blocksize. This can be configured as follows:
    <property>
    <name>mapreduce.task.io.sort.mb</name>
    <value>200</value>
    </property>
  4. The next value to tune is the number of streams to merge ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required