July 2017
Intermediate to advanced
382 pages
9h 13m
English
Another potential limitation is that k-means cannot learn the number of clusters from the data. Instead, we must tell it how many clusters we expect beforehand. You can see how this could be problematic for complicated real-world data that you don't fully understand yet.
From the viewpoint of k-means, there is no wrong or nonsensical number of clusters. For example, if we ask the algorithm to identify six clusters in the dataset generated in the preceding section, it will happily proceed and find the best six clusters:
In [10]: criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER,... 10, 1.0)... flags = cv2.KMEANS_RANDOM_CENTERS... compactness, labels, centers = cv2.kmeans(X.astype(np.float32), ...
Read now
Unlock full access