O'Reilly logo

Apache Mahout Clustering Designs by Ashish Gupta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Understanding distance measures

In each clustering problem, such as document clustering, protein clustering, genome sequence, galaxy image grouping, and so on, we need to calculate the distance between points. Let's start this problem with the points in a two-dimensional XY plane, and later in the section of preparing data for Mahout, we will discuss how to handle other types of documents, such as text.

Suppose that we have a set of points and know the points' coordinates (x and y position). We want to group the points into different clusters. How do we achieve this? For this type of problem, we will calculate the distance between points, the points close to each other will be part of one cluster, and the points; that are away from one group will ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required