A. TestasDistributed Machine Learning with PySparkhttps://doi.org/10.1007/978-1-4842-9751-3_17

17. Pipelines with Scikit-Learn and PySpark

Abdelaziz Testas¹

(1)

Fremont, CA, USA

In this chapter, we explore the topic of pipeline techniques in both Scikit-Learn and PySpark. By harnessing the power of pipelines, data scientists can automate and standardize the steps involved in the modeling workflow. This enables the building of robust and scalable models, enhances model interpretability, and facilitates the integration of additional preprocessing steps and feature engineering techniques.

To illustrate how pipelines can streamline the modeling process and improve ...

Get Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn by Abdelaziz Testas

17. Pipelines with Scikit-Learn and PySpark

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly