Part 2. Meet the Spark family

It’s time to get to know the other components that make up Spark: Spark SQL, Spark Streaming, Spark MLlib, and Spark GraphX. You’ve already made a brief acquaintance of Spark SQL in chapter 3. In chapter 5, you’ll be formally introduced. You’ll learn how to create and use DataFrames, how to use SQL to query DataFrame data, and how to load data to and save it from external data sources. You’ll also learn about optimizations done by Spark’s SQL Catalyst optimization engine and about performance improvements introduced with the Tungsten project.

Spark Streaming, one of the more popular family members, is introduced in chapter 6. There you’ll learn about discretized streams, which periodically produce RDDs as the streaming ...

Get Spark in Action now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.