Training a Naive Bayes classifier
Now that we can extract features from text, we can train a classifier. The easiest classifier to get started with is the
NaiveBayesClassifier
class. It uses the
Bayes theorem to predict the probability that a given feature set belongs to a particular label. The formula is:
P(label | features) = P(label) * P(features | label) / P(features)
The following list describes the various parameters from the previous formula:
P(label)
: This is the prior probability of the label occurring, which is the likelihood that a random feature set will have the label. This is based on the number of training instances with the label compared to the total number of training instances. For example, if 60/100 training instances have the ...
Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.