Windows operations

Spark Streaming allows for windowed processing, which enables you to apply transformations over a sliding window of events. This sliding window is created over a specified interval.

Every time a window slides over a DStream, the source RDDs that fall into the window specification are combined to create a windowed DStream as shown in the following diagram. The window must have two specified parameters:

  • Window length – specifies the interval length considered
  • Sliding interval – the interval at which a window is created
The window length and the sliding interval are required to be a multiple of the block interval.

The following ...

Get Big Data Analytics with Hadoop 3 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.