Challenges in DC algorithm
A distribution-based clustering algorithm like GMM is an expectation-maximization algorithm. To avoid the overfitting problem, GMM usually models the dataset with a fixed number of Gaussian distributions. The distributions are initialized randomly, and the related parameters are iteratively optimized too to fit the model better to the training dataset. This is the most robust feature of GMM and helps the model to be converged toward the local optimum. However, multiple runs of this algorithm may produce different results.
In other words, unlike the bisecting K-means algorithm and soft clustering, GMM is optimized for hard clustering, and in order to obtain of that type, objects are often assigned to the Gaussian ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access