Filtering spam and viruses out of incoming mail is an unfortunate necessity on today’s Internet. It would be easy to write a book on spam filtering techniques, but this chapter is designed to present techniques and examples rather than a complete filtering strategy. (Even if it did have a complete strategy, by the time you read it, the character of spam would have changed enough that you’d have to change your filters anyway.)
Spam and virus filters can use any of a wide range of message characteristics for filtering. They include:
The IP address from which the message is received
The information sent in commands in the SMTP session, including the argument to the HELO or EHLO command, the envelope sender in MAIL FROM, and the envelope recipients in RCPT TO
The contents of message headers, including From:, To:, Subject:, and Received:
The contents of the message body
It’s also possible and often useful to make filtering decisions based on combinations of messages, such as the number of messages received per minute from a particular IP address, or “bulkiness” scores based on the number of messages seen with similar or identical contents.
Filtering can be applied at several places in the receipt and delivery process. The earlier a filter is applied, the more quickly a message is dealt with. Filtering points include:
At connection time, for IP address and rDNS-based filters
During the SMTP session, before ...