© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
A. TestasDistributed Machine Learning with PySparkhttps://doi.org/10.1007/978-1-4842-9751-3_6

6. Gradient-Boosted Tree Regression with Pandas, Scikit-Learn, and PySpark

Abdelaziz Testas1  
(1)
Fremont, CA, USA
 

In this chapter, we continue with supervised learning and tree-based regression. Specifically, we develop a gradient-boosted tree (GBT) regression model using the same housing dataset we used for decision tree and random forest regression in the preceding chapters. This way, we can have a better idea about which tree type performs better by comparing their performance metrics.

There are similarities between random forest and GBT regression models. The ...

Get Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.