As you have already seen, using the OVTR classifier we observed the following values of the performance metrics on the OCR dataset:
Accuracy = 0.5217246545696688Precision = 0.488360500637862Recall = 0.5217246545696688F1 = 0.4695649096879411Test Error = 0.47827534543033123
This signifies that the accuracy of the model on that dataset is very low. In this section, we will see how we could improve the performance using the DT classifier. An example with Spark 2.1.0 will be shown using the same OCR dataset. The example will have several steps including data loading, parsing, model training, and, finally, model evaluation.
Since we will be using the same dataset, to avoid redundancy, we will escape ...