The first step is to create a test and training set for cross-validation purposes.
In Scala, the split is easier to implement, and the randomSplit function is available:
val splits = data.randomSplit(Array(0.8, 0.2), seed = 11L) val training = splits(0).cache() val test = splits(1)