At first, use the embedded model metrics which the Spark API provides. We are going to use the same approach that we used in the previous chapter. We start by defining a method to extract model metrics for a given model and dataset:
import org.apache.spark.mllib.evaluation._ import org.apache.spark.mllib.tree.model._ def getMetrics(model: RandomForestModel, data: RDD[LabeledPoint]): MulticlassMetrics = { val predictionsAndLabels = data.map(example => (model.predict(example.features), example.label) ) new MulticlassMetrics(predictionsAndLabels) }
Then we can directly compute Spark MulticlassMetrics:
val rfModelMetrics = getMetrics(rfModel, testData)
And look at first interesting classification model metrics called Confusion ...