We use the same dataset, but this time we use a Decision Tree to solve the regression problem with the data. Noteworthy is the creation of a metrics calculation function, which utilizes the Spark RegressionMetrics():
def getMetrics(model: DecisionTreeModel, data: RDD[LabeledPoint]): RegressionMetrics = { val predictionsAndLabels = data.map(example => (model.predict(example.features), example.label) ) new RegressionMetrics(predictionsAndLabels) }
We then proceed to perform the actual regression using DecisionTree.trainRegressor() and obtain the impurity measurement (GINI). We then proceed to output the actual regression, which is a series of decision nodes/branches and the value used to make a decision at the given branch: ...