A high information word is a word that is strongly biased towards a single classification label. These are the kinds of words we saw when we called the
show_most_informative_features() method on both the
NaiveBayesClassifier and the
MaxentClassifier. Somewhat surprisingly, the top words are different for both classifiers. This discrepancy is due to how each classifier calculates the significance of each feature, and it's actually beneficial to have these different methods as they can be combined to improve accuracy, as we will see in the next recipe, Combining classifiers with voting.
The low information words are words that are common to all labels. It may be counter-intuitive, but eliminating these words from ...