O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Topic modeling 

When we have a collection of documents for which we do not clearly know the categories, topic models help us to roughly find the categorization. The model treats each document as a mixture of topics, probably with one dominating topic.

For example, let's suppose we have the following sentences:

  • Eating fruits as snacks is a healthy habit
  • Exercising regularly is an important part of a healthy lifestyle
  • Grapefruit and oranges are citrus fruits

A topic model of these sentences may output the following:

  • Topic A: 40% healthy, 20% fruits, 10% snacks
  • Topic B: 20% Grapefruit, 20% oranges, 10% citrus
  • Sentence 1 and 2: 80% Topic A, 20% Topic B
  • Sentence 3: 100% Topic B

From the output of the model, we can guess that Topic A is about ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required