Topic modeling

When we have a collection of documents for which we do not clearly know the categories, topic models help us to roughly find the categorization. The model treats each document as a mixture of topics, probably with one dominating topic.

For example, let's suppose we have the following sentences:

  • Eating fruits as snacks is a healthy habit
  • Exercising regularly is an important part of a healthy lifestyle
  • Grapefruit and oranges are citrus fruits

A topic model of these sentences may output the following:

  • Topic A: 40% healthy, 20% fruits, 10% snacks
  • Topic B: 20% Grapefruit, 20% oranges, 10% citrus
  • Sentence 1 and 2: 80% Topic A, 20% Topic B
  • Sentence 3: 100% Topic B

From the output of the model, we can guess that Topic A is about ...

Get Hands-On Natural Language Processing with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.