© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
A. TestasDistributed Machine Learning with PySparkhttps://doi.org/10.1007/978-1-4842-9751-3_1

1. An Easy Transition

Abdelaziz Testas1  
(1)
Fremont, CA, USA
 

One of the key factors in making the transition from Pandas and Scikit-Learn to PySpark relatively easy is the similarity in functionality. This similarity will become evident after reading this chapter and executing the code described herein.

One of the easiest ways to test the code is by signing up for an online Databricks Community Edition account and creating a workspace. Databricks provides detailed documentation on how to create a cluster, upload data, and create a notebook. Additionally, Spark can ...

Get Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.