Dataset interfaces and functions

Now let's work out a few interesting examples, starting out with a simple one and then moving on to progressively complex operations.

Tip

The code files are in fdps-v3/code, and the data files are in fdps-v3/data. You can run the code either from a Scala IDE or just from the Spark Shell.

Start Spark Shell from the bin directory where you have installed the spark:

/Volumes/sdxc-01/spark-2.0.0/bin/spark-shell

Inside the shell, the following command will load the source:

:load /Users/ksankar/fdps-v3/code/DS01.scala

Read/write operations

As we saw earlier, SparkSession.read.* gives us a rich set of features to read different types of data with flexible control over the options. Dataset.write.* does the same for writing ...

Get Fast Data Processing with Spark 2 - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Fast Data Processing with Spark 2 - Third Edition by Krishna Sankar

Dataset interfaces and functions

Tip

Read/write operations

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly