Chapter 12. Event Time–Based Stream Processing

In “The Effect of Time”, we discussed the effect of time in stream processing from a general perspective.

As we recall, event-time processing refers to looking at the stream of events from the timeline at which they were produced and applying the processing logic from that perspective. When we are interested in analyzing the patterns of the event data over time, it is necessary to process the events as if we were observing them at the time they were produced. To do this, we require the device or system that produces the event to “stamp” the events with the time of creation. Hence, the usual name “timestamp” to refer to a specific event-bound time. We use that time as our frame of reference for how time evolves.

To illustrate this concept, let’s explore a familiar example. Consider a network of weather stations used to monitor local weather conditions. Some remote stations are connected through the mobile network, whereas others, hosted at volunteering homes, have access to internet connections of varying quality. The weather monitoring system cannot rely on the arrival order of the events because that order is mostly dependent on the speed and reliability of the network they are connected to. Instead, the weather application relies on each weather station to timestamp the events delivered. Our stream processing then uses these timestamps to compute the time-based aggregations that feed the weather forecasting system. ...

Get Stream Processing with Apache Spark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.