Bayesian email filters require training—a set of identified spam and non-spam from which to derive initial probabilities for words and phrases that appear in those email messages. The community seems to disagree about how much training is necessary. One author’s mailbox receives 400 spam emails per day, thus easing the availability of spam material. A pruned inbox with anything more than 100 non-spam messages is a source for non-spam material. Training with a megabyte each of spam and non-spam seems quite sufficient. (Note that this is text training, not Word or other attachments.)

Bayesian filters suffer from the general observation that they seem to be much stronger when trained for each individual mail user rather than for a larger ...

Get Slamming Spam: A Guide for System Administrators now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.