In this chapter, we continue with our exploration of supervised learning with a focus on regression tasks. Specifically, we will be building a regression model using the decision tree algorithm—an alternative to the multiple linear regression model we used in the previous chapter. We will use both Scikit-Learn and PySpark to train and evaluate the model and then use it to predict the sale price of houses based on several features such as the size of property ...
4. Decision Tree Regression with Pandas, Scikit-Learn, and PySpark
Get Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.