Training a learner

Finally, we arrive at the hot piece of the task: training a classifier. Classifiers are contained in the pyspark.ml.classification package, and, for this example, we're using a random forest. For Spark 2.3.1, you can find the extensive list of algorithms that are available at https://spark.apache.org/docs/2.3.1/ml-classification-regression.html. The list of algorithms is quite complete, comprising linear models, SVM, Naive Bayes, and tree ensembles. Note that not all of them are capable of operating on multiclass problems, and may have different parameters; always check the documentation related to the version in use. Beyond classifiers, the other learners implemented in Spark 2.3.1 with a Python interface are as follows: ...

Get Python Data Science Essentials - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.