O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

k-means clustering

k-means attempts to partition a set of data points into K distinct clusters (where K is an input parameter for the model).

More formally, k-means tries to find clusters so as to minimize the sum of squared errors (or distances) within each cluster. This objective function is known as the within cluster sum of squared errors (WCSS).

It is the sum, over each cluster, of the squared errors between each point and the cluster center.

Starting with a set of K initial cluster centers (which are computed as the mean vector for all data points in the cluster), the standard method for K-means iterates between two steps:

  1. Assign each ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required