Chapter 5. Sparkling queries with Spark SQL

This chapter covers

  • Creating DataFrames
  • Using the DataFrame API
  • Using SQL queries
  • Loading and saving data from/to external data sources
  • Understanding the Catalyst optimizer
  • Understanding Tungsten performance improvements
  • Introducing DataSets

You had a taste of working with DataFrames in chapter 3. As you saw there, DataFrames let you work with structured data (data organized in rows and columns, where each column contains only values of a certain type). SQL, frequently used in relational databases, is the most common way to organize and query this data. SQL also figures as part of the name of the first Spark component we’re covering in part 2: Spark SQL.

In this chapter, we plunge deeper ...

Get Spark in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.