TweetStats

In this branch, we compute aggregate metrics over a sliding window in event time that can be used for time series visualization. The metrics are simple counts: total tweets, number of tweets with hashtags, and number of tweets with URLs. The result will contain these metrics along with the window timestamp for visualization.

The first operation is to assign the timestamp to the incoming event. This is necessary because the window operator currently requires the input tuple time to implement a tuple interface and the Twitter status object needs to be wrapped to accomplish this. While we're at it, we can also extract the event time (when the tweet actually occurred) from the status object and make it available under TimestampedTuple ...

Get Learning Apache Apex now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.