Performing the analytics
With both the batch and real-time infrastructure in place, we can focus on the analytics. First, we will take a look at the processing in Pig, and then we will translate the Pig script into a Storm topology.
Executing the batch analysis
For the batch analysis, we use Pig. The Pig script calculates the effectiveness of a campaign by computing the ratio between the distinct numbers of customers that have clicked-thru and the total number of impressions.
The Pig script is shown in the following code snippet:
click_thru_data = LOAD '../click_thru_data.txt' using PigStorage(' ') AS (cookie_id:chararray, campaign_id:chararray, product_id:chararray, click:chararray); click_thrus = FILTER click_thru_data BY click == 'true'; distinct_click_thrus ...
Get Storm Blueprints: Patterns for Distributed Real-time Computation now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.