O'Reilly logo

Learning SciPy for Numerical and Scientific Computing by Francisco J. Blanco-Silva

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Clustering

Another technique used in data mining is clustering. SciPy has two modules to deal with any problem in this field, each of them addressing a different clustering tool – scipy.cluster.vq for k-means and scipy.cluster.hierarchy for hierarchical clustering.

Vector quantization and k-means

We have two routines to divide data into clusters using the k-means technique – kmeans and kmeans2. They correspond to two different implementations. The former has a very simple syntax:

kmeans(obs, k_or_guess, iter=20, thresh=1e-05)

The obs parameter is an ndarray with the data we wish to cluster. If the dimensions of the array are m x n, the algorithm interprets this data as m points in the n-dimensional Euclidean space. If we know the number of clusters ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required