GridMix is a tool for benchmarking Hadoop clusters. It generates a number of synthetic MapReduce jobs and builds a model based on the performance of these jobs. Resource profiles of the cluster are modeled based on the job execution metrics. The profiles can help us find performance bottlenecks of the cluster. In this section, we will outline steps for benchmarking Hadoop with GridMix.
We assume that our Hadoop cluster has been properly configured and all the daemons are running without any issues.
Currently, GridMix has three versions. For the purpose of differentiation and notation, we will use
GridMix to represent GridMix version 1, use
GridMix2 to represent GridMix version 2, and ...