O'Reilly logo

Hadoop MapReduce v2 Cookbook - Second Edition by Thilina Gunarathne

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Calculating histograms using MapReduce

Another interesting view of a dataset is a histogram. A histogram makes sense only under a continuous dimension (for example, accessed time and file size). It groups the number of occurrences of an event into several groups in the dimension. For example, in this recipe, if we take the accessed time as the dimension, then we will group the accessed time by the hour.

The following figure shows the execution summary of this computation. The Mapper emits the hour of the access as the key and 1 as the value. Then, each reduce function invocation receives all the occurrences of a certain hour of the day, and it calculates the total number of occurrences for that hour of the day.

Getting ready

This recipe assumes that ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required