Benchmarking a Hadoop cluster with GridMix

GridMix is a tool for benchmarking Hadoop clusters. It generates a number of synthetic MapReduce jobs and builds a model based on the performance of these jobs. Resource profiles of the cluster are modeled based on the job execution metrics. The profiles can help us find performance bottlenecks of the cluster. In this section, we will outline steps for benchmarking Hadoop with GridMix.

Getting ready

We assume that our Hadoop cluster has been properly configured and all the daemons are running without any issues.


Currently, GridMix has three versions. For the purpose of differentiation and notation, we will use GridMix to represent GridMix version 1, use GridMix2 to represent GridMix version 2, and ...

Get Hadoop Operations and Cluster Management Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.