December 2018
Beginner to intermediate
684 pages
21h 9m
English
In this chapter, we explored the use of topic modeling to gain insights into the content of a large collection of documents. We covered Latent Semantic Analysis, which uses dimensionality reduction of the DTM to project documents into a latent topic space. While effective in addressing the curse of dimensionality caused by high-dimensional word vectors, it does not capture much semantic information. Probabilistic models make explicit assumptions about the interplay of documents, topics, and words that allow algorithms to reverse engineer the document generation process and evaluate the model fit on new documents. We saw that LDA is capable of extracting plausible topics that allow us to gain a high-level understanding of large amounts ...