© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
A. TestasDistributed Machine Learning with PySparkhttps://doi.org/10.1007/978-1-4842-9751-3_2

2. Selecting Algorithms

Abdelaziz Testas1  
(1)
Fremont, CA, USA
 

Machine learning offers a variety of algorithms for both supervised and unsupervised learning tasks, each with numerous parameters to fine-tune. However, testing and optimizing all of these models in each category would be incredibly cumbersome and require significant computational power. To address this challenge, this chapter introduces k-fold cross-validation, a technique that helps select the best-performing model from a range of different algorithms. With this method, the top-performing models can ...

Get Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.