For text, instead, a popular unsupervised algorithm that can be used to understand a common set of words in a collection of documents is Latent Dirichlet Allocation or LDA.
LDA aims to extract sets of homogeneous words, or topics, out of a collection of documents. The math behind the algorithm is very advanced; here we will see just a practical notion of it.
Let's start with an example to explain why LDA is popular and why other unsupervised methods aren't good enough when dealing with text. K-means and DBSCAN, for example, provide a hard decision for each sample, putting ...