Chapter 7. Clustering with Mahout

In this chapter, we will discuss one of the major application areas of machine learning. Cluster analysis has wide areas of application like customer segmentation, news grouping, grouping users based on their behavior, and so on.

We will also get an understanding of the internals of a few important clustering algorithms and then discuss their implementation in Mahout. The topics that we will discuss in this chapter are as follows:

Data preprocessing
k-means
Canopy clustering
Fuzzy k-means
Streaming k-means

k-means

k-means is one of the simplest and most widely-used clustering algorithms. Given the number of K clusters to look for, k-means provides K clusters with respective data points belonging to a cluster, depending ...

Get Learning Apache Mahout now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Learning Apache Mahout by Chandramani Tiwary

Chapter 7. Clustering with Mahout

k-means

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly