Faster average computations with aggregate

In the previous section, we saw how we can use map and reduce to calculate averages. Let's now look at faster average computations with the aggregate function. You can refer to the documentation mentioned in the previous section.

The aggregate is a function that takes three arguments, none of which are optional.

The first one is the zeroValue argument, where we put in the base case of the aggregated results.

The second argument is the sequential operator (seqOp), which allows you to stack and aggregate values on top of zeroValue. You can start with zeroValue, and the seqOp function that you feed into aggregate takes values from your RDD, and stacks or aggregates it on top of zeroValue.

The last argument ...

Get Hands-On Big Data Analytics with PySpark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.