O'Reilly logo

Learning PySpark by Denny Lee, Tomasz Drabas

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

A quick primer on global aggregations

As noted in the previous section, so far, our script has performed a point in time streaming word count. The following diagram denotes the lines DStream and its micro-batches as per how our script had executed in the previous section:

A quick primer on global aggregations

At the 1 second mark, our Python Spark Streaming script returned the value of {(blue, 5), (green, 3)}, at the 2 second mark it returned {(gohawks, 1)}, and at the 4 second mark, it returned {(green, 2)}. But what if you had wanted the aggregate word count over a specific time window?

The following figure represents us calculating a stateful aggregation:

In this case, we have a time ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required