Advanced Techniques: Cleverer Statistics
Several spam researchers recognized a problem with the raw Bayesian approach: a word that appears only once in, say, a spam message has a probability of 100% associated with that word’s “spamminess.” Intuitively, this does not feel right because random words might appear in any email.
Gary Robinson  made several extremely useful suggestions in his Linux Journal article on spam. First of all, he defined p(w) as the probability that an email with the word “w” is spam:
When a ...