© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
A. TestasDistributed Machine Learning with PySparkhttps://doi.org/10.1007/978-1-4842-9751-3_5

5. Random Forest Regression with Pandas, Scikit-Learn, and PySpark

Abdelaziz Testas1  
(1)
Fremont, CA, USA
 

In the preceding chapter, we developed a decision tree regression model to predict house prices. In this chapter, we introduce an alternative model known as random forest. Despite both regression models utilizing decision trees, they exhibit notable distinctions.

First, decision trees are simpler models characterized by a single tree structure, whereas random forests are more intricate, comprising multiple decision trees. Furthermore, decision trees are prone ...

Get Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.