April 2018
Beginner to intermediate
282 pages
6h 52m
English
There are two types of commonly used clustering algorithms: distance-based and probabilistic models. For example, k-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) are distance-based algorithms, whereas the Gaussian mixture model is probabilistic.
Distance-based algorithms may use a variety of distance measures where Euclidean distance metrics are usually used.
Probabilistic algorithms will assume that there is a generative process with a mixture of probability distributions with unknown parameters and the goal is to calculate these parameters from the data.
Since there are many clustering algorithms, picking the right one depends on the characteristics of your data. For example, ...