O'Reilly logo

Apache Mahout Clustering Designs by Ashish Gupta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Using DistanceMeasure interface

Usually, the quality of cluster depends on the selected distance measure and the weight of the features in the vector (document). A correct distance measure can bring similar items together. Mahout provides us the flexibility to write custom distance measures. Mahout provides the DistanceMeasure interface under org.apache.mahout.common.distance package. The main method to override here is doubledistance(Vector v1, Vector v2).

Let's take a look at a small implementation of this interface in the following code snippet (source: Mahout in Action):

public double distance(Vector vector1, Vector vector2) { if(vector1.size()!=vector2.size()){ throw newCardinalityException(vector1.size(), vector2.size()); } double lengthSquaredv1 ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required