O'Reilly logo

Apache Mahout Essentials by Jayani Withanawasam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Optimization tips

Configuring the values of the following configuration entries according to the hardware/software configurations of the Hadoop cluster helps to use the available resources, such as CPU and memory, optimally.

The important configurations in the mapred-site.xml file are given as follows:

  1. Set the maximum tasks that can be executed in the map phase and the reduce phase:
    mapreduce.tasktracker.map.tasks.maximum
    mapreduce.tasktracker.reduce.tasks.maximum
    
  2. Set the number of map and reduce tasks according to number of cores available:
    mapreduce.job.reduces
    mapreduce.job.maps
    

The important configurations in the hdfs-site.xml file are given as follows:

  1. Set the block size for the files according to the storage requirements of your problem:

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required