July 2017
Intermediate to advanced
796 pages
18h 55m
English
Bisecting K-means can often be much faster than regular K-means, but it will generally produce a different clustering. A bisecting K-means algorithm is based on the paper, A comparison of document clustering techniques by Steinbach, Karypis, and Kumar, with modification to fit with Spark MLlib.
Bisecting K-means is a kind of divisive algorithm that starts from a single cluster that contains all the data points. Iteratively, it then finds all the divisible clusters on the bottom level and bisects each of them using K-means until there are K leaf clusters in total or no leaf clusters divisible. After that, clusters on the same level are grouped together to increase the parallelism. In other words, bisecting ...
Read now
Unlock full access