© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
A. TestasDistributed Machine Learning with PySparkhttps://doi.org/10.1007/978-1-4842-9751-3_8

8. Decision Tree Classification with Pandas, Scikit-Learn, and PySpark

Abdelaziz Testas1  
(1)
Fremont, CA, USA
 

In this chapter, we will continue with classification as a form of supervised learning. Our objective is to develop, train, and evaluate a decision tree classification model for predicting the species of an Iris flower based on its feature measurements. We will leverage the well-known Iris dataset, which consists of measurements of four features (sepal length, sepal width, petal length, and petal width) from three distinct species of Iris flowers (setosa, ...

Get Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.