O'Reilly logo

Hadoop Operations and Cluster Management Cookbook by Shumin Guo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Analyzing job history with Rumen

Rumen is a tool for extracting well-formatted information from job logfiles. It parses logs and generates statistics for the Hadoop jobs. The job traces can be used for performance tuning and simulation.

Current Rumen implementation includes two components: TraceBuilder and folder . The TraceBuilder takes job history as input and generates easily parsed json files. The folder is a utility to manipulate on input traces, and, most of the time, it is used to scale the summarized job traces from the TraceBuilder. For example, we can use the folder tool to scale up (make time longer) or down (make time shorter) the job runtime. In this recipe, we will outline steps to analyze the job history with Rumen.

Getting ready ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required