A. TestasDistributed Machine Learning with PySparkhttps://doi.org/10.1007/978-1-4842-9751-3_3

3. Multiple Linear Regression with Pandas, Scikit-Learn, and PySpark

Abdelaziz Testas¹

(1)

Fremont, CA, USA

This chapter demonstrates how to build, train, evaluate, and use a multiple linear regression model in both Scikit-Learn and PySpark. It shows that the steps involved in machine learning, including splitting data, model training, model evaluation, and prediction, are the same in both frameworks. Furthermore, Pandas and PySpark have similar approaches to data manipulation, which simplifies tasks like exploring data.

These similarities aid the data scientist ...

Get Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn by Abdelaziz Testas

3. Multiple Linear Regression with Pandas, Scikit-Learn, and PySpark

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly