O'Reilly logo

Storm Blueprints: Patterns for Distributed Real-time Computation by Brian O'Neill, P. Taylor Goetz

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Performing the analytics

With both the batch and real-time infrastructure in place, we can focus on the analytics. First, we will take a look at the processing in Pig, and then we will translate the Pig script into a Storm topology.

Executing the batch analysis

For the batch analysis, we use Pig. The Pig script calculates the effectiveness of a campaign by computing the ratio between the distinct numbers of customers that have clicked-thru and the total number of impressions.

The Pig script is shown in the following code snippet:

click_thru_data = LOAD '../click_thru_data.txt' using PigStorage(' ') AS (cookie_id:chararray, campaign_id:chararray, product_id:chararray, click:chararray); click_thrus = FILTER click_thru_data BY click == 'true'; distinct_click_thrus ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required