O'Reilly logo

Learning Spark, 2nd Edition by Tathagata Das, Brooke Wenig, Denny Lee, Jules Damji

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 5. Spark SQL and Datasets

In Chapters 4 and 5, we covered Spark SQL and the DataFrame API: how to connect to built-in and external data sources; interoperability between SQL and DataFrames; creating and managing views and tables; advanced DataFrame and SQL transformations; and a peek into the Spark SQL engine.

Although we briefly introduced Datasets as strongly-typed immutable collections in Chapter 3, we skimmed over some salient aspects of how Datasets are created, stored, and serialized and deserialized in Spark.

In this chapter, we go under the hood to understand Datasets: how to work with Datasets in Java and Scala, how Spark manages memory to accommodate Dataset constructs as part of the unified and high-level API, and costs for using Datasets.

Single API for Java and Scala

As you may recall from Chapter 3 (Figure 3-5 and Table 3-6), Datasets offer a unified and singular API for strongly typed-objects for languages such as Scala and Java. Since typed-object is a feature of Java Virtual Machine (JVM), Datasets are unique only to Scala and Java among the language APIs supported in Spark. They are neither part of Python nor R.

But more importantly, they are domain-specific typed-objects that can be operated on in parallel using functional programming or domain specific relational language (DSL) operators we have become so familiar with in the DataFrame API.

This singular API now ensures that Java developers no longer lag behind the Scala API interface since both ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required