O'Reilly logo

Apache Mahout Essentials by Jayani Withanawasam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

K-Means clustering

K-Means clustering is a simple and fast clustering algorithm that has been widely adopted in many problem domains. In this chapter, we will give a detailed explanation of the K-Means algorithm, as it will provide the base for other algorithms. K-Means clustering assigns data points to k number of clusters (cluster centroids) by minimizing the distance from the data points to the cluster centroids.

Let's consider a simple scenario where we need to cluster people based on their size (height and weight are the selected attributes) and different colors (clusters):

K-Means clustering

We can plot this problem in two-dimensional space, as shown in the following ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required