A. TestasDistributed Machine Learning with PySparkhttps://doi.org/10.1007/978-1-4842-9751-3_15

15. k-Means Clustering with Pandas, Scikit-Learn, and PySpark

Abdelaziz Testas¹

(1)

Fremont, CA, USA

In this chapter, we delve into the process of building, training, and evaluating a k-means clustering algorithm for effective data segmentation. Clustering is a commonly used technique in segmentation analysis to group similar observations together based on their characteristics or their proximity in the feature space. The result is a set of clusters, with each observation assigned to a specific cluster. By organizing data into clusters, we can gain a deeper understanding ...

Get Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn by Abdelaziz Testas

15. k-Means Clustering with Pandas, Scikit-Learn, and PySpark

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly