Step 14 – creating training and test datasets

This step is important because we are going to create a model that we want to train with a training set. One way to create a training set is to partition the current dataframe and assign 80% of it to a new training dataset:

val splitFeaturizedDF = featurizedDF.randomSplit(Array(0.80, 0.20), 98765L)splitFeaturizedDF1: Array[org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]] = Array([filteredMailFeatures: string, label: double ... 2 more fields],    [filteredMailFeatures: string, label: double ... 2 more fields])

Now, let's retrieve the training set:

val trainFeaturizedDF = splitFeaturizedDF(0)

The testing dataset follows. Here is how we will create it:

val testFeaturizedDF = splitFeaturizedDF( ...

Get Modern Scala Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.