O'Reilly logo

Learning Spark, 2nd Edition by Tathagata Das, Brooke Wenig, Denny Lee, Jules Damji

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3. Apache Spark’s Structured APIs

Structure has permeated our society, and systematically ordered our livelihood. Similarly, structured (or organized) data allows us to accomplish simple or complex tasks in a systematic manner.

Even the etymology of structure as a noun and verb says the same thing:1 2

Noun

The action or process of building or constructing

Verb

Put together systematically or arrange according to a plan or give a pattern or organization to

In this chapter, we will explore the principal motivations behind adding structure to Apache Spark, how structure led to the creation of high-level APIs - DataFrames and Datasets - and their unification in Spark 2.x across its components, and the Spark SQL engine that underpins these structured high-level APIs.

A Bit of History…

When Spark SQL was first introduced in the early Spark 1.x releases3, followed by DataFrames as a successor to SchemaRDDs4 5 in Spark 1.3, we got our first glimpse of structure in Spark. At this time, Spark SQL introduced high-level expressive operational functions, mimicking SQL-like syntax, and DataFrames by providing spreadsheet-like named columns with data types dictated by a schema. DataFrames laid the foundation for more structure in subsequent releases and paved the path to performant operations in Spark’s computational queries.

But before we talk about structure in Spark, let’s get a brief glimpse of what it means to not have structure in Spark by peeking into the simple ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required