Skip to Main Content
Modern Scala Projects
book

Modern Scala Projects

by Ilango gurusamy
July 2018
Intermediate to advanced content levelIntermediate to advanced
334 pages
8h 20m
English
Packt Publishing
Content preview from Modern Scala Projects

Step 14 – creating training and test datasets

This step is important because we are going to create a model that we want to train with a training set. One way to create a training set is to partition the current dataframe and assign 80% of it to a new training dataset:

val splitFeaturizedDF = featurizedDF.randomSplit(Array(0.80, 0.20), 98765L)splitFeaturizedDF1: Array[org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]] = Array([filteredMailFeatures: string, label: double ... 2 more fields],    [filteredMailFeatures: string, label: double ... 2 more fields])

Now, let's retrieve the training set:

val trainFeaturizedDF = splitFeaturizedDF(0)

The testing dataset follows. Here is how we will create it:

val testFeaturizedDF = splitFeaturizedDF( ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Scala Programming Projects

Scala Programming Projects

Mikael Valot, Nicolas Jorand

Publisher Resources

ISBN: 9781788624114Supplemental Content