© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
T. C. NokeriData Science Solutions with Pythonhttps://doi.org/10.1007/978-1-4842-7762-1_8

8. Cluster Analysis with Scikit-Learn, PySpark, and H2O

Tshepo Chris Nokeri1  
(1)
Pretoria, South Africa
 

This chapter explains the k-means cluster method by implementing a diverse set of Python frameworks (i.e., Scikit-Learn, PySpark, and H2O). To begin, it clarifies how the method apportions values to clusters.

Exploring the K-Means Method

The k-means method is the most common distance-computing method. It is part of the unsupervised machine learning family. It employs the Euclidean distance objective function to efficiently compute the distance between values, then determines ...

Get Data Science Solutions with Python: Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.