O'Reilly logo

Learning Apache Mahout by Chandramani Tiwary

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 7. Clustering with Mahout

In this chapter, we will discuss one of the major application areas of machine learning. Cluster analysis has wide areas of application like customer segmentation, news grouping, grouping users based on their behavior, and so on.

We will also get an understanding of the internals of a few important clustering algorithms and then discuss their implementation in Mahout. The topics that we will discuss in this chapter are as follows:

  • Data preprocessing
  • k-means
  • Canopy clustering
  • Fuzzy k-means
  • Streaming k-means

k-means

k-means is one of the simplest and most widely-used clustering algorithms. Given the number of K clusters to look for, k-means provides K clusters with respective data points belonging to a cluster, depending ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required