© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
A. TestasDistributed Machine Learning with PySparkhttps://doi.org/10.1007/978-1-4842-9751-3_9

9. Random Forest Classification with Scikit-Learn and PySpark

Abdelaziz Testas1  
(1)
Fremont, CA, USA
 

In this chapter, we continue with supervised learning tree-based classification, specifically random forests. We proceed by building, training, and evaluating a random forest classifier to classify the species of an Iris flower using the same dataset employed in the previous chapter. Previously, we emphasized that decision trees are powerful machine learning algorithms adept at classification tasks. Nonetheless, they can be susceptible to overfitting, especially ...

Get Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.