Skip to Content
Practical Data Analysis Cookbook
book

Practical Data Analysis Cookbook

by Tomasz Drabas
April 2016
Beginner to intermediate content levelBeginner to intermediate
384 pages
8h 36m
English
Packt Publishing
Content preview from Practical Data Analysis Cookbook

Clustering data with k-means algorithm

The k-means clustering algorithm is likely the most widely known data mining technique for clustering vectorized data. It aims at partitioning the observations into discrete clusters based on the similarity between them; the deciding factor is the Euclidean distance between the observation and centroid of the nearest cluster.

Getting ready

To run this recipe, you need pandas and Scikit. No other prerequisites are required.

How to do it…

Scikit offers several clustering models in its cluster submodule. Here, we will use .KMeans(...) to estimate our clustering model (the clustering_kmeans.py file):

def findClusters_kmeans(data): ''' Cluster data using k-means ''' # create the classifier object kmeans = cl.KMeans( ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Data Analysis Cookbook

Python Data Analysis Cookbook

Ivan Idris
Practical Simulations for Machine Learning

Practical Simulations for Machine Learning

Paris Buttfield-Addison, Mars Buttfield-Addison, Tim Nugent, Jon Manning

Publisher Resources

ISBN: 9781783551668Supplemental Content