Dataset interfaces and functions

Now let's work out a few interesting examples, starting out with a simple one and then moving on to progressively complex operations.

Tip

The code files are in fdps-v3/code, and the data files are in fdps-v3/data. You can run the code either from a Scala IDE or just from the Spark Shell.

Start Spark Shell from the bin directory where you have installed the spark:

/Volumes/sdxc-01/spark-2.0.0/bin/spark-shell 

Inside the shell, the following command will load the source:

:load /Users/ksankar/fdps-v3/code/DS01.scala

Read/write operations

As we saw earlier, SparkSession.read.* gives us a rich set of features to read different types of data with flexible control over the options. Dataset.write.* does the same for writing ...

Get Fast Data Processing with Spark 2 - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.