Chapter 5. Sparkling queries with Spark SQL

This chapter covers

  • Creating DataFrames
  • Using the DataFrame API
  • Using SQL queries
  • Loading and saving data from/to external data sources
  • Understanding the Catalyst optimizer
  • Understanding Tungsten performance improvements
  • Introducing DataSets

You had a taste of working with DataFrames in chapter 3. As you saw there, DataFrames let you work with structured data (data organized in rows and columns, where each column contains only values of a certain type). SQL, frequently used in relational databases, is the most common way to organize and query this data. SQL also figures as part of the name of the first Spark component we’re covering in part 2: Spark SQL.

In this chapter, we plunge deeper ...

Get Spark in Action now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.