November 2017
Beginner to intermediate
290 pages
7h 34m
English
The primary computational pattern for aggregation in Beam is to group all elements with a common key (and Window—but we'll get to windowing later), and then, to combine each group of elements using an associative and commutative operation. Like the Reduce from MapReduce, this originates in functional programming from the early 90s. In Beam, aggregation has been enhanced to work with unbounded input streams as well as bounded datasets.
When elements are grouped and emitted as-is, the aggregation is known as GroupByKey (the associative/commutative operation is just bag union). In this case, the output is no smaller than the input, so Reduce might be a misnomer.
However, often, you will ...
Read now
Unlock full access