September 2015
Beginner to intermediate
608 pages
13h 43m
English
The count operation we implemented previously is a sequential algorithm. Each line is processed one at a time until the sequence is exhausted. But there is nothing about the operation that demands that it must be done in this way.
We could split the number of lines into two sequences (ideally of roughly equal length) and reduce over each sequence independently. When we're done, we would just add together the total number of lines from each sequence to get the total number of lines in the file:

If each Reduce ran on its own processing unit, then the two count operations would run in parallel. All the other things being equal, the ...
Read now
Unlock full access