July 2017
Intermediate to advanced
796 pages
18h 55m
English
Event time is the time inside the data itself. Traditional Spark Streaming only handled time as the received time for the DStream purposes, but this is not enough for many applications where we need the event time. For example, if you want to get the number of times hashtag appears in a tweet every minute, then you should want to use the time when the data was generated, not when Spark receives the event. To get event time into the mix, it is very easy to do so in structured streaming by considering the event time as a column in the row/event. This allows window-based aggregations to be run using the event time rather than the received time. Furthermore, this model naturally handles data that has arrived ...
Read now
Unlock full access