May 2019
Intermediate to advanced
456 pages
11h 38m
English
A final, very useful application of word counting is topic modeling. Given a set of texts, are we able to find clusters of topics? The method to do this is called Latent Dirichlet Allocation (LDA).
Note: The code and data for this section can be found on Kaggle at https://www.kaggle.com/jannesklaas/topic-modeling-with-lda.
While the name is quite a mouth full, the algorithm is a very useful one, so we will look at it step by step. LDA makes the following assumption about how texts are written: