We discussed K-Clustering in this chapter. We also discussed how the K-means algorithm works and we used the Mahout implementation of K-means on a text dataset. We downloaded the data and converted it to a Mahout reusable vector format.
We discussed how to understand the cluster using the
clusterdumper utility. We saw an example class to visualize the Mahout cluster as given in the Mahout example class.
Now, we will move on to the next chapter, where we will discuss Canopy clustering. This is also a very good technique and can be used to estimate the number of K for K-means clustering.