July 2017
Beginner to intermediate
715 pages
17h 3m
English
Clustering can be seen as a special kind of dimensionality reduction. For example, if you group your data into K clusters, then you can compress it into K centroids. Once we have done it, each data point can be represented as a vector of distances to each of those centroids. If K is smaller than the dimensionality of your data, it can be seen as a way of reducing the dimensionality.
Let's implement this. First, let's run a K-means on some data. We can use the performance dataset we used previously.
We will use Smile again, and we already know how to run K-means. Here is the code:
double[][] X = ...; // data int k = 60; int maxIter = 10; int runs = 1; KMeans km = new KMeans(X, k, maxIter, runs);