O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Training a classification model on the Kaggle/StumbleUpon evergreen classification dataset

We can now apply the models from Spark ML to our input data. First, we need to import the required classes, and set up some minimal input parameters for each model. For logistic regression and SVM, this is the number of iterations while, for the decision tree model, it is the maximum tree depth.

import  org.apache.spark.mllib.classification.LogisticRegressionWithSGD import org.apache.spark.mllib.classification.SVMWithSGD import org.apache.spark.mllib.classification.NaiveBayes import org.apache.spark.mllib.tree.DecisionTree import org.apache.spark.mllib.tree.configuration.Algo import org.apache.spark.mllib.tree.impurity.Entropy  val numIterations = 10 ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required