July 2017
Intermediate to advanced
360 pages
8h 26m
English
The first method is based on the assumption that an appropriate number of clusters must produce a small inertia. However, this value reaches its minimum (0.0) when the number of clusters is equal to the number of samples; therefore, we can't look for the minimum, but for a value which is a trade-off between the inertia and the number of clusters.
Let's suppose we have a dataset of 1,000 elements. We can compute and collect the inertias (scikit-learn stores these values in the instance variable inertia_) for a different number of clusters:
>>> nb_clusters = [2, 3, 5, 6, 7, 8, 9, 10]>>> inertias = []>>> for n in nb_clusters:>>> km = KMeans(n_clusters=n)>>> km.fit(X)>>> inertias.append(km.inertia_)
Plotting the values, ...
Read now
Unlock full access