April 2016
Beginner to intermediate
384 pages
8h 36m
English
The k-means clustering algorithm is likely the most widely known data mining technique for clustering vectorized data. It aims at partitioning the observations into discrete clusters based on the similarity between them; the deciding factor is the Euclidean distance between the observation and centroid of the nearest cluster.
To run this recipe, you need pandas and Scikit. No other prerequisites are required.
Scikit offers several clustering models in its cluster submodule. Here, we will use .KMeans(...) to estimate our clustering model (the clustering_kmeans.py file):
def findClusters_kmeans(data): ''' Cluster data using k-means ''' # create the classifier object kmeans = cl.KMeans( ...