Dataset interfaces and functions
Now let's work out a few interesting examples, starting out with a simple one and then moving on to progressively complex operations.
Tip
The code files are in fdps-v3/code
, and the data files are in fdps-v3/data
. You can run the code either from a Scala IDE or just from the Spark Shell.
Start Spark Shell from the bin directory where you have installed the spark:
/Volumes/sdxc-01/spark-2.0.0/bin/spark-shell
Inside the shell, the following command will load the source:
:load /Users/ksankar/fdps-v3/code/DS01.scala
Read/write operations
As we saw earlier, SparkSession.read.*
gives us a rich set of features to read different types of data with flexible control over the options. Dataset.write.*
does the same for writing ...
Get Fast Data Processing with Spark 2 - Third Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.