O'Reilly logo

Programming MapReduce with Scalding by Antonios Chalkiopoulos

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Operations on groups

Operations groupAll and groupBy are essential building blocks of Scalding applications, and they generate groups. groupAll generates a single group containing all the available tuples. groupBy generates m number of groups, where m is the number of unique keys in the data.

For example, if groupBy('color) is executed and three unique colors exist in the data, then three groups will be generated. Once grouping is achieved, a number of group operations can be applied to them.

The first seven group operations average, count, min, max, sum, size, and sizeAveStdev are useful to extract statistics from data, and their syntax is as follows:

group.average(field -> newField)
group.count(field -> newField) { function }
group.min(field -> ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required