Slamming Spam: A Guide for System Administrators

Word Analysis

Bayes Rule enables calculation of the probability that a message is spam, given an observed probability that various words indicated spam (or non-spam) in the past. One of the drawbacks of non-Bayesian filtering is the lack of a “big picture” about the message (for example, looking only for certain keywords, addresses, or other patterns). Initial Bayesian spam filters chose only 30 words to examine [1, 2]. Newer filters [4] look much more deeply.

One author [5] carefully determined word stems (such as reducing “mails” and “mailing” to “mail”). Graham [1, 2] was careful to generalize his analyses to include headers (which is intuitive because certain sources of email issue only spam).

Bill Yerazunis, author of the spam-filtering ...

Get Slamming Spam: A Guide for System Administrators now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Slamming Spam: A Guide for System Administrators by Dale Nielsen, Robert Haskins

Word Analysis

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly