O'Reilly logo

Hadoop MapReduce v2 Cookbook - Second Edition by Thilina Gunarathne

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Topic discovery using Latent Dirichlet Allocation (LDA)

We can use Latent Dirichlet Allocation (LDA) to cluster a given set of words into topics and a set of documents into combinations of topics. LDA is useful when identifying the meaning of a document or a word based on the context, without solely depending on the number of words or the exact words. LDA is a step away from raw text matching and towards semantic analysis. LDA can be used to identify the intent and to resolve ambiguous words in a system such as a search engine. Some other example use cases of LDA are identifying influential Twitter users for particular topics and Twahpic (http://twahpic.cloudapp.net) application uses LDA to identify topics used on Twitter.

LDA uses the TF vector ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required