5. Topic Modeling


This chapter introduces topic modeling, which means using unsupervised machine learning to find "topics" within a given set of documents. You will explore the most common approaches to topic modeling, which are Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and the Hierachical Dirichlet Process (HDP), and learn the differences between them. You will then practice implementing these approaches in Python and review the common practical challenges in topic modeling. By the end of this chapter, you will be able to create topic models from any given dataset.


In the previous chapter, we learned about different ways to collect data from local files and online resources. In this chapter, ...

Get The Natural Language Processing Workshop now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.