O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Streaming analytics

Next, we will create a slightly more complex streaming program. In Chapter 1, Getting Up and Running with Spark, we calculated a few metrics on our dataset of product purchases. These included the total number of purchases, the number of unique users, the total revenue, and the most popular product (together with its number of purchases and total revenue).

In this example, we will compute the same metrics on our stream of purchase events. The key difference is that these metrics will be computed per batch and printed out.

We will define our streaming application code here:

/**  * A more complex Streaming app, which computes statistics and    prints the results for each batch in a DStream */ object StreamingAnalyticsApp ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required