July 2018
Intermediate to advanced
334 pages
8h 20m
English
Now, let's split our dataset in two by providing a random seed:
val splitDataSet: Array[org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]] = dataSet.randomSplit(Array(0.85, 0.15), 98765L)
Now our new splitDataset contains two datasets:
Confirm that the new dataset is of size 2:
splitDataset.sizeres48: Int = 2
Assign the training dataset to a variable, trainSet:
val trainDataSet = splitDataSet(0)trainSet: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [iris-features-column: vector, iris-species-label-column: ...Read now
Unlock full access