Chapter 7. Clustering with Mahout
In this chapter, we will discuss one of the major application areas of machine learning. Cluster analysis has wide areas of application like customer segmentation, news grouping, grouping users based on their behavior, and so on.
We will also get an understanding of the internals of a few important clustering algorithms and then discuss their implementation in Mahout. The topics that we will discuss in this chapter are as follows:
- Data preprocessing
- k-means
- Canopy clustering
- Fuzzy k-means
- Streaming k-means
k-means
k-means is one of the simplest and most widely-used clustering algorithms. Given the number of K clusters to look for, k-means provides K clusters with respective data points belonging to a cluster, depending ...
Get Learning Apache Mahout now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.