November 2017
Beginner to intermediate
290 pages
7h 34m
English
The most elementary form of massively parallel computation is doing the same thing to every element in a stream or massive dataset. In Beam, this computational pattern is called ParDo, short for Parallel Do. You might think of it as Map from MapReduce or similar to Apex's Transform operator (http://apex.apache.org/docs/malhar/operators/transform/):

ParDo is embarrassingly parallel: there are no dependencies between the processing of each element. Every input element can be processed potentially in parallel on separate machines. This computational pattern applies to both the bounded datasets, such as a huge collection ...
Read now
Unlock full access