k-means may seem like a very simple algorithm, which it is. However, it does make some underlying assumptions about your data, which are easy to overlook:
- Spherical or spatially grouped clusters: k-means basically draws spherical or spatially close areas in our feature space to find clusters. This means that for non-spherical clusters (essentially, clusters that do not look like grouped blobs in our features space), k-means is likely to fail. To make this idea more concrete, non-spherical clusters, for which k-means will likely behave poorly, might look like the following:
- Similar size: k-means also assumes ...