Summary
In this chapter, we introduced topic modeling. We discussed latent semantic analysis based on truncated SVD, PLSA (which aims to build a model without assumptions about latent factor prior probabilities), and LDA, which outperformed the previous method and is based on the assumption that the latent factor has a sparse prior Dirichlet distribution. This means that a document normally covers only a limited number of topics and a topic is characterized by only a few important words.
In the last section, we discussed the basics of Word2vec and the sentiment analysis of documents, which is aimed at determining whether a piece of text expresses a positive or negative feeling. To show a feasible solution, we built a classifier based on an ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access