K-Means

K-means is a clustering algorithm implemented in the cxcore because it was written long before the ML library. K-means attempts to find the natural clusters or "clumps" in the data. The user sets the desired number of clusters and then K-means rapidly finds a good placement for those cluster centers, where "good" means that the cluster centers tend to end up located in the middle of the natural clumps of data. It is one of the most used clustering techniques and has strong similarities to the expectation maximization algorithm for Gaussian mixture (implemented as CvEM() in the ML library) as well as some similarities to the mean-shift algorithm discussed in Chapter 9 (implemented as cvMeanShift() in the CV library). K-means is an iterative algorithm and, as implemented in OpenCV, is also known as Lloyd's algorithm[242] or (equivalently) "Voronoi iteration". The algorithm runs as follows.

  1. Take as input (a) a data set and (b) desired number of clusters K (chosen by the user).

  2. Randomly assign cluster center locations.

  3. Associate each data point with its nearest cluster center.

  4. Move cluster centers to the centroid of their data points.

  5. Return to step 3 until convergence (centroid does not move).

Figure 13-5 diagrams K-means in action; in this case, it takes just two iterations to converge. In real cases the algorithm often converges rapidly, but it can sometimes require a large number of iterations.

Problems and Solutions

K-means is an extremely effective clustering algorithm, but it ...

Get Learning OpenCV now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.