Using the Dataset API in an immutable way

In this section, we will use the Dataset API in an immutable way. We will cover the following topics:

  • Dataset immutability
  • Creating two leaves from the one root dataset
  • Adding a new column by issuing transformation

The test case for the dataset is quite similar, but we need to do a toDS() for our data to be type safe. The type of dataset is userData, as shown in the following example:

import com.tomekl007.UserDataimport org.apache.spark.sql.SparkSessionimport org.scalatest.FunSuiteclass ImmutableDataSet extends FunSuite { val spark: SparkSession = SparkSession .builder().master("local[2]").getOrCreate()test("Should use immutable DF API") { import spark.sqlContext.implicits._ //given val userData ...

Get Hands-On Big Data Analytics with PySpark now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.