O'Reilly logo

Mastering Machine Learning with Spark 2.x by Michal Malohlava, Max Pumperla, Alex Tellez

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Spark model metrics

At first, use the embedded model metrics which the Spark API provides. We are going to use the same approach that we used in the previous chapter. We start by defining a method to extract model metrics for a given model and dataset:

import org.apache.spark.mllib.evaluation._ 
import org.apache.spark.mllib.tree.model._ 
def getMetrics(model: RandomForestModel, data: RDD[LabeledPoint]): 
    MulticlassMetrics = { 
        val predictionsAndLabels = data.map(example => 
            (model.predict(example.features), example.label) 
        ) 
        new MulticlassMetrics(predictionsAndLabels) 
} 

Then we can directly compute Spark MulticlassMetrics:

val rfModelMetrics = getMetrics(rfModel, testData) 

And look at first interesting classification model metrics called Confusion ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required