Learning Apache Apex
by Ananth Gundabattula, Thomas Weise, Munagala V. Ramanath, David Yan, Kenneth Knowles
TweetStats
In this branch, we compute aggregate metrics over a sliding window in event time that can be used for time series visualization. The metrics are simple counts: total tweets, number of tweets with hashtags, and number of tweets with URLs. The result will contain these metrics along with the window timestamp for visualization.
The first operation is to assign the timestamp to the incoming event. This is necessary because the window operator currently requires the input tuple time to implement a tuple interface and the Twitter status object needs to be wrapped to accomplish this. While we're at it, we can also extract the event time (when the tweet actually occurred) from the status object and make it available under TimestampedTuple ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access