O'Reilly logo

Natural Language Processing and Computational Linguistics by Bhargav Srinivasa-Desikan

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Topic models in scikit-learn

Gensim isn't the only package offering us the ability to topic model: scikit-learn, while not dedicated for text, still offers fast implementations of LDA and Non-negative Matrix Factorization (NMF), which can help us identify topics.

We already discussed how LDA works, and the only difference between the Gensim and scikit-learn implementations are as follows:

  1. The perplexity bounds are not expected to agree exactly here because the bound is calculated differently in Gensim versus sklearn. These bounds are ways we calculate how topics converge in topic modeling algorithms.
  2. Sklearn uses cython which creates numerical 6th decimal point differences.

Non-negative matrix factorization (NMF) [15], unlike LDA, is not ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required