Topic modeling using NMF

Non-negative matrix factorization (NMF) relies heavily on linear algebra. It factorizes an input matrix, V, into a product of two smaller matrices, W and H, in such a way that these three matrices have no negative values. In the context of NLP, these three matrices have the following meanings:

  • The input matrix V is the term counts or tf-idf matrix of size n * m, where n is the number of documents or samples, and m is the number of terms.
  • The first decomposition output matrix W is the feature matrix of size t * m, where t is the number of topics specified. Each row of W represents a topic with each element in the row representing the rank of a term in the topic.
  • The second decomposition output matrix H is the coefficient ...

Get Python Machine Learning By Example - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.