© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
A. TestasDistributed Machine Learning with PySparkhttps://doi.org/10.1007/978-1-4842-9751-3_7

7. Logistic Regression with Pandas, Scikit-Learn, and PySpark

Abdelaziz Testas1  
(1)
Fremont, CA, USA
 

This chapter focuses on classification, a distinct form of supervised learning. Our objective is to build, train, and evaluate a logistic regression model and then use it to predict the likelihood of diabetes.

Despite its name suggesting a connection to linear regression, logistic regression is fundamentally distinct in its purpose and methodology. Contrary to linear regression, which predicts continuous numerical values (see Chapter 3), logistic regression predicts ...

Get Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.