Topic modeling using NMF

Non-negative matrix factorization (NMF) relies heavily on linear algebra. It factorizes an input matrix, V, into a product of two smaller matrices, W and H, in such a way that these three matrices have no negative values. In the context of NLP, these three matrices have the following meanings:

  • The input matrix V is the term counts or tf-idf matrix of size n * m, where n is the number of documents or samples, and m is the number of terms.
  • The first decomposition output matrix W is the feature matrix of size t * m, where t is the number of topics specified. Each row of W represents a topic with each element in the row representing the rank of a term in the topic.
  • The second decomposition output matrix H is the coefficient ...

Get Python Machine Learning By Example - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.