In the previous method, we didn't make any assumptions about the topic prior to distribution and this can result in a limitation because the algorithm isn't driven by any real-world intuition. LDA, instead, is based on the idea that a topic is characterized by a small ensemble of important words and normally a document doesn't cover many topics. For this reason, the main assumption is that the prior topic distribution is a symmetric Dirichlet one. The probability density function is defined as:
If the concentration parameter alpha is below 1.0, the distribution will be sparse as desired. This allows us to model ...