Another interesting view of a dataset is a histogram. A histogram makes sense only under a continuous dimension (for example, accessed time and file size). It groups the number of occurrences of an event into several groups in the dimension. For example, in this recipe, if we take the accessed time as the dimension, then we will group the accessed time by the hour.
The following figure shows the execution summary of this computation. The Mapper emits the hour of the access as the key and 1 as the value. Then, each
reduce function invocation receives all the occurrences of a certain hour of the day, and it calculates the total number of occurrences for that hour of the day.
This recipe assumes that ...