O'Reilly logo

Fast Data Processing with Spark 2 - Third Edition by Krishna Sankar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Spark SQL programming

Let's now get our hands dirty and work through various examples. We will start with a simple Dataset and then progressively perform more sophisticated SQL statements. We will use the NorthWind Dataset.

Datasets/DataFrames

In short, Datasets are semantic domain-specific objects, which means they are very rich in terms of typing and they possess all the functions of RDDs. In short, the best of both worlds! A DataFrame is an untyped view into a Dataset, basically a collection of rows. This is useful for doing abstract generic operations on a Dataset, that is, operations that depend only on the positions of elements in a row and other factors. We will learn more in later sections.

Tip

As languages, such as Python and R, do not have ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required