In each clustering problem, such as document clustering, protein clustering, genome sequence, galaxy image grouping, and so on, we need to calculate the distance between points. Let's start this problem with the points in a two-dimensional *XY* plane, and later in the section of preparing data for Mahout, we will discuss how to handle other types of documents, such as text.

Suppose that we have a set of points and know the points' coordinates (*x* and *y* position). We want to group the points into different clusters. How do we achieve this? For this type of problem, we will calculate the distance between points, the points close to each other will be part of one cluster, and the points; that are away from one group will ...

Start Free Trial

No credit card required