© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
P. SinghMachine Learning with PySparkhttps://doi.org/10.1007/978-1-4842-7777-5_7

7. Clustering in PySpark

Pramod Singh1  
(1)
Bangalore, Karnataka, India
 

So far, we have seen supervised Machine Learning where the target variable or label is known to us, and we try to predict the output based on the input features. Unsupervised indicates that there is no labeled data and we don’t try to predict any output. Instead, we try to find interesting patterns and come up with groups within the data. It’s more of an art rather than going after the prediction accuracy. The values within the groups are very similar to each other, whereas any two groups are very distinct ...

Get Machine Learning with PySpark: With Natural Language Processing and Recommender Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.