Optimizing K-means cluster solutions

K-means clustering is a well-established technique for grouping entities together based on overall similarity. It has many applications including customer segmentation, anomaly detection (finding records that don't fit into existing clusters), and variable reduction (converting many input variables into fewer composite variables).

For all its power and popularity, the K-means algorithm does have a number of known limitations. First, the K-means algorithm is iterative and can arrive at many possible solutions based on the data and the initial algorithm parameters. Some solutions may be better than other solutions and the final solution generally depends on the choice for the location of the initial cluster centers. ...

Get IBM SPSS Modeler Cookbook now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.