Chapter 6. Machine Learning with Spark

We have spent a considerable amount of time understanding the architecture of Spark, RDDs, DataFrames and Dataset-based APIs, Spark SQL, and Streaming, all of which was primarily related to building the foundations of what we are going to discuss in this chapter, which is machine learning. Our focus has been on getting the data onto the Spark platform either in batch or in streaming fashion, and transforming it into the desired state.

Once you have the data in the platform, what do you do with it? You can either use it for reporting purposes, building dashboards, or letting your data scientists analyze the data to detect patterns, identify reasons for specific events, understand the behavior of customers, ...

Get Learning Apache Spark 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.