Case Study 2: Spam Networks
The aim of the second study was to see where some of the spam that I receive comes from. The consensus view is that most spam is being sent via computers infected with viruses that set up email relays without the owners’ knowledge.
I wanted to collect the IP addresses of the machines that relayed the messages to my server and look for any correlations between those and the specific types of spam that they handled. I had no shortage of data. At the time of this analysis, I had 29,041 messages in my Junk folder, which originated from 22,429 different IP addresses. The vast majority of these (92% of the total) were the source of only a single message. Figure 11-2 shows how few addresses were involved in sending multiple emails. Note that the Y-axis is logarithmic.
Several alternative conclusions can be drawn from this distribution. The spam domain blacklist from Spamhaus (http://www.spamhaus.org/sbl/) that I use to reject known spam sources could be so efficient that most source machines can only get one message through to me before being blocked. I doubt that this is the case.
Figure 11-2. Number of messages originating from each IP address
It could be that the owners of these machines, or their ISPs, realize that they are acting as mail relays as soon as they send out the first batch of spam and either remove the relay or shut down that machine. This is possible ...
Get Internet Forensics now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.