Chapter 10. Evaluating and improving clustering quality

This chapter covers

Inspecting clustering output
Evaluating the quality of clustering
Improving clustering quality

We saw many types of clustering algorithms in the last chapter: k-means, canopy, fuzzy k-means, Dirichlet, and latent Dirichlet analysis (LDA). They all performed well on certain types of data and sometimes poorly on others. The most natural question that comes to mind after every clustering job is, “How well did the algorithm perform on the data?”

Analyzing the output of clustering is an important exercise. It can be done with simple command-line tools or richer GUI-based visualizations. Once the clusters are visualized and problem areas are identified, these results ...

Get Mahout in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Mahout in Action by Sean Owen, B. Ellen Friedman, Robin Anil, Ted Dunning

Chapter 10. Evaluating and improving clustering quality

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly