Evaluating a learner's performance

The next step in any datascience task is to check the performance of the learner on the training and testing datasets. For this task, we will use the F1-score as it's a good metric that merges precision and recall performances. Evaluation metrics are enclosed in the pyspark.ml.evaluation package; among the few choices we have, we're using the one to evaluate multiclass classifiers: MulticlassClassificationEvaluator. As parameters, we're providing the metric (precision, recall, accuracy, F1-score, and so on) and the name of the columns containing the true label and predicted label:

In: from pyspark.ml.evaluation import MulticlassClassificationEvaluator    evaluator = MulticlassClassificationEvaluator( labelCol="target_cat", ...

Get Python Data Science Essentials - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.