O'Reilly logo

Spark in Action by Petar Zečević Marko Bonaći

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 5. Sparkling queries with Spark SQL

This chapter covers

  • Creating DataFrames
  • Using the DataFrame API
  • Using SQL queries
  • Loading and saving data from/to external data sources
  • Understanding the Catalyst optimizer
  • Understanding Tungsten performance improvements
  • Introducing DataSets

You had a taste of working with DataFrames in chapter 3. As you saw there, DataFrames let you work with structured data (data organized in rows and columns, where each column contains only values of a certain type). SQL, frequently used in relational databases, is the most common way to organize and query this data. SQL also figures as part of the name of the first Spark component we’re covering in part 2: Spark SQL.

In this chapter, we plunge deeper ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required