When the training is completed, we compute the prediction on the test set to evaluate the robustness of the model:
Dataset<Row> predictions = model.transform(validationData);
Now, how about seeing some sample predictions? Let's observe both the true labels and the predicted labels:
predictions.show();
We can see that some predictions are correct but some of them are wrong too. Nevertheless, in this way, it is difficult to guess the performance. Therefore, we can compute performance metrics such as precision, recall, and f1 measure:
MulticlassClassificationEvaluator evaluator = new MulticlassClassificationEvaluator() ...