O'Reilly logo

Fast Data Processing with Spark 2 - Third Edition by Krishna Sankar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Dataset interfaces and functions

Now let's work out a few interesting examples, starting out with a simple one and then moving on to progressively complex operations.

Tip

The code files are in fdps-v3/code, and the data files are in fdps-v3/data. You can run the code either from a Scala IDE or just from the Spark Shell.

Start Spark Shell from the bin directory where you have installed the spark:

/Volumes/sdxc-01/spark-2.0.0/bin/spark-shell 

Inside the shell, the following command will load the source:

:load /Users/ksankar/fdps-v3/code/DS01.scala

Read/write operations

As we saw earlier, SparkSession.read.* gives us a rich set of features to read different types of data with flexible control over the options. Dataset.write.* does the same for writing ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required