© Ramcharan Kakarla, Sundar Krishnan and Sridhar Alla 2021
R. Kakarla et al.Applied Data Science Using PySparkhttps://doi.org/10.1007/978-1-4842-6500-0_5

5. Supervised Learning Algorithms

Ramcharan Kakarla1  , Sundar Krishnan1 and Sridhar Alla2
(1)
Philadelphia, PA, USA
(2)
New Jersey, NJ, USA
 

It’s time to do some learning based on the data. Most folks think machine learning is applying an algorithm on given data and then predicting results. Well, it’s not just that. Eighty percent of the work involves data collection, preprocessing, cleaning, feature engineering, transformation, and selecting the best features. The remaining 20 percent is spent on building machine learning models, validation, and deployment. The entire operation is called MLOps (machine ...

Get Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.