September 2017
Beginner to intermediate
304 pages
7h 2m
English
Let's say that we have a bunch of data points defined by two variables, x1 and x2. These data points naturally exhibit some grouping into clusters, as shown in the following figure:

To automatically cluster these points using k-means, we would first need to choose how many clusters will result from the clustering. This is the parameter k, which gives k-means its name. In this case, let's use k = 3.
We would then randomly choose the x1 and x2 locations of k centroids. These random centroids will serve as our starting point for the algorithm. Such random centroids are shown in the following figure via Xs:
To optimize ...
Read now
Unlock full access