cat_totals = self.totals
aggregates = {cat: cat_totals[c]/cat_totals['_all'] for c in self.categories]
for token in Tokenizer.unique_tokenizer(email.body()):
for cat in self.categories:
value = self.training[cat][token]
r = (value+1)/(cat_totals[cat]+1)
aggregates[cat] *= r
return aggregates
This test does the following:
• First, it trains the model if it’s not already trained (the
train method handles
this).
• For each token of the blob of an email we iterate through all categories and calcu‐
late the probability of that token being within that category. This calculates the
Naive Bayesian score of each without dividing by Z.
Now that we ...