January 2018
Intermediate to advanced
470 pages
11h 9m
English
The following code splits the ratings RDD into training data RDD (75%) and test data RDD (25%). Seed here is optional but is required for reproducibility purposes:
// Split ratings RDD into training RDD (75%) & test RDD (25%)
val splits = ratingsDF.randomSplit(Array(0.75, 0.25), seed = 12345L)
val (trainingData, testData) = (splits(0), splits(1))
val numTraining = trainingData.count()
val numTest = testData.count()
println("Training: " + numTraining + " test: " + numTest)
You should notice that there are 78,792 ratings in training and 26,547 ratings in the test DataFrame.
Read now
Unlock full access