Part 2. Meet the Spark family

It’s time to get to know the other components that make up Spark: Spark SQL, Spark Streaming, Spark MLlib, and Spark GraphX. You’ve already made a brief acquaintance of Spark SQL in chapter 3. In chapter 5, you’ll be formally introduced. You’ll learn how to create and use DataFrames, how to use SQL to query DataFrame data, and how to load data to and save it from external data sources. You’ll also learn about optimizations done by Spark’s SQL Catalyst optimization engine and about performance improvements introduced with the Tungsten project.

Spark Streaming, one of the more popular family members, is introduced in chapter 6. There you’ll learn about discretized streams, which periodically produce RDDs as the streaming ...

Get Spark in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.