December 2018
Beginner to intermediate
684 pages
21h 9m
English
Topic coherence measures the semantic consistency of the topic model results, that is, whether humans would perceive the words and their probabilities associated with topics as meaningful.
To this end, it scores each topic by measuring the degree of semantic similarity between the words most relevant to the topic. More specifically, coherence measures are based on the probability of observing the set of words W that define a topic together.
We use two measures of coherence that have been designed for LDA and shown to align with human judgment of topic quality, namely the UMass and the UCI measures.
The UCI metric defines a word pair's score to be the sum of the Pointwise Mutual Information (PMI) between two distinct pairs ...