Topic models in scikit-learn

Gensim isn't the only package offering us the ability to topic model: scikit-learn, while not dedicated for text, still offers fast implementations of LDA and Non-negative Matrix Factorization (NMF), which can help us identify topics.

We already discussed how LDA works, and the only difference between the Gensim and scikit-learn implementations are as follows:

  1. The perplexity bounds are not expected to agree exactly here because the bound is calculated differently in Gensim versus sklearn. These bounds are ways we calculate how topics converge in topic modeling algorithms.
  2. Sklearn uses cython which creates numerical 6th decimal point differences.

Non-negative matrix factorization (NMF) [15], unlike LDA, is not ...

Get Natural Language Processing and Computational Linguistics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.