O'Reilly logo

Hadoop Operations and Cluster Management Cookbook by Shumin Guo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Configuring JVM Reuse

MapReduce tasks are executed by JVM processes/threads, which are forked by the TaskTracker. The creation of a JVM, which includes the initialization of execution environments, is costly, especially when the number of tasks is large. In the default configuration, the number of JVMs needed to finish a job should be equal to the number of the tasks. In other words, the default setting uses one JVM to execute one task. When the execution of a task completes, its JVM will be killed by the TaskTracker.

JVM Reuse is an optimization of reusing JVMs for multiple tasks. If it is enabled, multiple tasks can be executed sequentially with one JVM.

In this recipe we will outline the steps to configure JVM Reuse.

Getting ready

We assume that ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required