Training

Bayesian email filters require training—a set of identified spam and non-spam from which to derive initial probabilities for words and phrases that appear in those email messages. The community seems to disagree about how much training is necessary. One author’s mailbox receives 400 spam emails per day, thus easing the availability of spam material. A pruned inbox with anything more than 100 non-spam messages is a source for non-spam material. Training with a megabyte each of spam and non-spam seems quite sufficient. (Note that this is text training, not Word or other attachments.)

Bayesian filters suffer from the general observation that they seem to be much stronger when trained for each individual mail user rather than for a larger ...

Get Slamming Spam: A Guide for System Administrators now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.