Training a learner

Finally, we arrive at the hot piece of the task: training a classifier. Classifiers are contained in the pyspark.ml.classification package, and, for this example, we're using a random forest. For Spark 2.3.1, you can find the extensive list of algorithms that are available at https://spark.apache.org/docs/2.3.1/ml-classification-regression.html. The list of algorithms is quite complete, comprising linear models, SVM, Naive Bayes, and tree ensembles. Note that not all of them are capable of operating on multiclass problems, and may have different parameters; always check the documentation related to the version in use. Beyond classifiers, the other learners implemented in Spark 2.3.1 with a Python interface are as follows: ...

Get Python Data Science Essentials - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.