5. Topic Modeling


This chapter introduces topic modeling, which means using unsupervised machine learning to find "topics" within a given set of documents. You will explore the most common approaches to topic modeling, which are Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and the Hierachical Dirichlet Process (HDP), and learn the differences between them. You will then practice implementing these approaches in Python and review the common practical challenges in topic modeling. By the end of this chapter, you will be able to create topic models from any given dataset.


In the previous chapter, we learned about different ways to collect data from local files and online resources. In this chapter, ...

Get The Natural Language Processing Workshop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.