O'Reilly logo

Apache Mahout Clustering Designs by Ashish Gupta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3. Understanding Canopy Clustering

In the previous chapter, we discussed K-means clustering and used Mahout to run K-means clustering on the text dataset. Therein, we discussed that one of the main challenges is to identify the initial number of clusters. We discussed the different techniques that we can use to identify the number of clusters in the dataset. One such technique is Canopy clustering. This algorithm is also called the preclustering algorithm. In this chapter, we will discuss Canopy clustering in detail. We will cover the following topics:

  • Learning Canopy clustering
  • Using Mahout to execute Canopy clustering
  • Visualizing Canopy cluster using Mahout
  • Working with CSV files

Canopy clustering, which is a pre-clustering algorithm, is ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required