BUY THIS BOOK
Add to Cart

Print Book $39.95


Safari Books Online

What is this?

Add to UK Cart

Print Book £28.50

What is this?

Looking to Reprint this content?

Internet Forensics
Internet Forensics

By Robert Jones
Price: $39.95 USD
£28.50 GBP

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction
Forensics is the application of scientific methods in criminal investigations. It is a unique field of study that draws from all areas of science, from entomology to genetics, from geology to mathematics, with the single goal of solving a mystery. It holds a great fascination for the general public. Thanks to television dramas, millions of us are familiar with how rifling marks on a bullet can identify a murder weapon and how luminol is used to reveal bloodstains in the bath.
Computer forensics studies how computers are involved in the commission of crimes. In cases ranging from accounting fraud, to blackmail, identity theft, and child pornography, the contents of a hard drive can contain critical evidence of a crime. The analysis of disks and the tracking of emails between individuals have become commonplace tools for law enforcement around the world.
Internet forensics shifts that focus from an individual machine to the Internet at large. With a single massive network that spans the globe, the challenge of identifying criminal activity and the people behind it becomes immense. A con artist in the United States can use a web server in Korea to steal the credit card number of a victim in Germany.
Unfortunately, the underlying protocols that handle Internet traffic were not designed to address the problems of spam, viruses, and so forth. It can be difficult, often impossible, to verify the source of a message or the operator of a web site. In cases like this the minor details become important. The layout of files on a web site or the way that email headers are forged can play the same role as a fingerprint at a physical crime scene.
This book shows you some of the ways in which the bad guys try to conceal their identities. I show you how simple techniques, a knowledge of how the Internet works, and an inquisitive mind can reveal a lot more about these people than they would like.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Is Internet Forensics?
Forensics is the application of scientific methods in criminal investigations. It is a unique field of study that draws from all areas of science, from entomology to genetics, from geology to mathematics, with the single goal of solving a mystery. It holds a great fascination for the general public. Thanks to television dramas, millions of us are familiar with how rifling marks on a bullet can identify a murder weapon and how luminol is used to reveal bloodstains in the bath.
Computer forensics studies how computers are involved in the commission of crimes. In cases ranging from accounting fraud, to blackmail, identity theft, and child pornography, the contents of a hard drive can contain critical evidence of a crime. The analysis of disks and the tracking of emails between individuals have become commonplace tools for law enforcement around the world.
Internet forensics shifts that focus from an individual machine to the Internet at large. With a single massive network that spans the globe, the challenge of identifying criminal activity and the people behind it becomes immense. A con artist in the United States can use a web server in Korea to steal the credit card number of a victim in Germany.
Unfortunately, the underlying protocols that handle Internet traffic were not designed to address the problems of spam, viruses, and so forth. It can be difficult, often impossible, to verify the source of a message or the operator of a web site. In cases like this the minor details become important. The layout of files on a web site or the way that email headers are forged can play the same role as a fingerprint at a physical crime scene.
This book shows you some of the ways in which the bad guys try to conceal their identities. I show you how simple techniques, a knowledge of how the Internet works, and an inquisitive mind can reveal a lot more about these people than they would like.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Seamy Underbelly of the Internet
History shows us that any situation that involves people and money will quickly attract crime. That has certainly been the case with the Internet. Online crime is at an all-time high and shows no signs of slowing down, despite the best efforts of the computer security industry.
Many forms of criminal activity use the Internet as a means of communication, either using email instead of phone calls or publishing offensive material on a web site instead of hard copy. But the Internet has allowed some types of crime to evolve in new ways so as to exploit the new opportunities that it provides.
Spam is the most widespread of these activities . Unsolicited email places a burden on millions of servers every day. Companies spend huge amounts of money on software and staff to help keep the problem under control. They do so to save their employees from having to deal with all of it on their desktops, which would incur even higher costs in the form of lower productivity.
People who are computer savvy tend to focus on the nuisance factor of spam because that is what directly affects us. We tend to overlook the content of those messages because we already know them to be scams . We would never dream of clicking on URLs for web sites that promise us cheap Viagra, great rates on mortgages, or the chance to meet lonely singles in our neighborhoods. But other people do! If they didn't, then the people running the web sites would not waste their money hiring the spammers to distribute their emails.
Most of these are traditional scams that have been updated to entice Internet-savvy victims. Their goal is to get you to hand over your credit card number. Being able to reach millions of potential victims through the power of spam is what makes it so attractive.
Phishing is the name we give to frauds involving fake web sites that look like those of banks or credit card companies. A phishing email is sent out like most other spam, but it attempts to entice victims by appearing to come from a well-known, legitimate business like Citibank or eBay. The message asks you to click on a URL that takes you to a web site. That web page, at first glance, looks just like the site of the genuine financial institution. The users are prompted to enter their online account information along with other personal details like their date of birth, credit card information, and so forth.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Pulling Back the Curtain
Who exactly is involved in Internet crime? The popular media seem to have settled on two very different profiles. The first is the Russian mob that has enlisted physicists, displaced from Cold War era government programs, to help them with their plans. The second is the American teenage boy nerd, seated in the dark isolation of his bedroom, working on the next great computer virus. Neither of these is really representative, although both contain substantial elements of truth. The fact is that the opportunities for this kind of fraud are so broad that someone can find a niche regardless of their technical background.
The advance fee scam, the so-called Nigerian 419 scam , requires nothing more than a good cover story, a list of email addresses, and the gall to carry it out. Creating a computer virus, or operating a professional spam distribution network, requires significant technical expertise. Some scams are so complex that multiple individuals must be involved. For an interesting perspective on a few individuals from the world of spam, I refer you to the book Spam Kings by Brian S. McWilliams (O'Reilly). In it, he describes how two well-known spammers got involved in the trade and how techniques like those described here were used to reveal them.
One thing common to everyone involved in Internet fraud is the desire to remain anonymous and thereby safe from prosecution. The bad guys go to great lengths to hang a curtain of disguise behind which they can operate. The forensic skills that you will learn from this book will help you pull back that curtain.
Just like traditional criminal forensics, you will use your skills to find the clues left behind at a crime scene. The only difference being that our crime scene takes the form of a web site, server, or email message. You are unlikely to uncover the name and address of the culprit, but you will be able to build up a picture of their operation, which can contain a surprising amount of detail.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Taking Back Our Internet
Over and above the immediate desire to identify the bad guys, I think a lot of us feel a deeper unease about their activities.
The developers and systems administrators among us talk about the Open Source Community , the informal collection of people responsible for creating and using Linux, Perl, and all the other tools that we use every day in our work. The word "community" is not just a convenient buzzword. Many of us feel a real sense of belonging to this global movement that has made the Internet what it is today.
No one can truly claim ownership of the Internet, but the Open Source Community can rightfully claim to be its stewards and guardians. As such, we feel betrayed by those who have crossed over to the Dark Side who are responsible for the nuisances and threats that all users now have to deal with.
Many developers have already stepped up to the challenge of taking back the Internet. Spam-filtering tools, firewalls, secure browsers, such as Firefox and Mozilla, along with a host of security patches, have been developed by open source developers for the good of the community. With the forensic techniques described in this book, I want to help advance another approach in this ongoing battle. By identifying the people responsible for these threats, we can put them under a great deal of pressure and force them to work much harder to achieve their goals.
I want this book to show you how easy it can be to uncover clues about Internet scams. You don't need to be a computer security expert to apply these skills. In fact the key to their success lies in having hundreds and thousands of people like you pushing back and putting pressure on the bad guys. Collectively, we can be a very powerful force.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Protecting Your Privacy
Disclosure and privacy are two sides of the same coin. The same forensic techniques that you use to investigate a phishing web site can be used against you by someone else. The techniques do not discriminate. Privacy is a major concern for some people, less so for others. Regardless of where you fall on that scale, you should always be aware of what others can learn about you. Throughout the book, I will play for both teams. I will show you how to, for example, mine a web site for useful data and then show how, as the operator of a site, you can limit that disclosure.
You can make the argument that, by taking this approach, this book may actually help the scammers evade detection. In some cases, this may happen. However, this same issue has been raised many times in the field of conventional computer security. The counter argument, that I think has prevailed in that field, is that most of the bad guys already know how to improve their operations if they choose to. Either they are just lazy, or they don't think the chance of being identified is high enough to warrant the effort.
By providing a full disclosure of the ways that scammers use to conceal themselves, and showing how you can still uncover identifying information, Internet forensics forces the bad guys further into a corner. There are many more of us than them, and our collective attention forces them to either work harder to practice their trade or, I hope, decide that it's not worth the effort.
That is exactly what we have seen with other aspects of computer security. In the Linux community, new security problems are disclosed for all to see as soon as they are discovered. That prompts developers to fix the issues in a timely manner. In the early days, some of the vulnerabilities were serious and undoubtedly their disclosure led to some systems being attacked. But overall the approach has been a resounding success. Vulnerabilities are still being discovered, but their impact is typically much reduced and often they are fixed before any real-world exploit has been created. Full disclosure of the ways scammers work has made life increasingly difficult for system attackers and has undoubtedly led many to focus their attentions elsewhere.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Before You Begin
I need to offer a few words of caution before you begin poking around some of the more dubious corners of the Internet.
Computer viruses and spyware are everyday threats on the Internet. But in actively seeking out and examining dubious web sites, you may be exposing your systems to higher than normal risks. As I describe in Chapter 3, the worlds of spam distribution and computer viruses have already merged in the form of the Sobig virus. This type of threat should not be a problem as long as you take suitable, simple precautions.
A Unix-based operating system, such as Linux or Mac OS X, is the preferred platform from which to investigate dubious web sites and email messages. The Unix environment is less susceptible to computer viruses, with control mechanisms that make it difficult for rogue executables to be installed simply by downloading them.
If you do use a Windows system to follow the techniques and examples given in this book, then you need to take several important precautions. It goes without saying that you need to have good antivirus software installed and running on the system. Not only that, it needs to be kept up to date with current virus definitions. If you are actively exploring web sites, then make sure you scan your system frequently.
The same goes for spyware , which is perhaps even more a problem in the context of visiting web sites. There are some excellent free tools available for finding and eradicating this on Windows computers—for example:
Ad-Aware
www.lavasoftusa.com/software/adaware/
Spybot - Search & Destroy
www.safer-networking.org/en/index.html
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Network Neighborhood Watch
Taking back the Internet from the con artists will require more than the efforts of computer security professionals. If it were that easy, then the problem would already have been solved. Educating consumers has undoubtedly helped, but people still fall victim to these scams every day.
I view myself as part of the global community of programmers and systems administrators, the power users of computing and the Internet. I suspect most readers of this book would feel the same affiliation. Given the technical skills that we possess, I feel that we have a collective responsibility to guide the development of the Internet and ensure that the values of freedom and openness are preserved as it continues to evolve.
We have the potential to make life very difficult for those behind Internet scams. With thousands of us working to reveal them, their sense of security will be threatened. I believe that this sense that nobody can touch them is a major reason for the growth of Internet crime. A community-based effort to uncover these scams has the potential to have a major impact. We need an effort similar to that of ordinary people who take part in a Neighborhood Watch to keep crime away from where they live simply by keeping an eye out for each other. We need a Network Neighborhood Watch.
This book will show you how to uncover information about web sites, servers, and email messages. It was written for anyone with modest computer skills, as opposed to the professional computer security expert. Anyone can apply these techniques. They use the basic tools and protocols of the Internet in creative ways to reveal clues that mostly go unnoticed. I think most readers will be surprised just how much can be revealed.
I encourage you to learn and experiment with the techniques, scripts, and hacks that are described here. If your Inbox is anything like mine, then you already have plenty of targets. I hope that you build upon these ideas and go on to share your own with the rest of this community. And I hope that you will do your part to make the life harder for the bad guys and in doing so, make the Internet a better place for all of us.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Names and Numbers
Hostnames, and the numeric addresses they correspond to, are the way to identify computers on the Internet. Understanding how these names and numbers are managed is therefore a fundamental aspect of Internet forensics. This chapter describes the types of information you can obtain from public databases of Internet addresses and discusses three essential tools that can help you identify machines and the people behind them. I'll start with a short review of how computers are identified on the Internet.
Each computer on the Internet has a unique identifier in the form of its Internet Protocol (IP) address. This is a 32-bit integer, which we normally represent as four 8-bit integers separated by periods, such as 208.12.16.5.
Numeric addresses are fine for systems administrators who need to set up networks and who like that sort of thing. But for most people, they are impossible to remember and so we have real names for computers, the hostnames that we are all familiar with, such as www.oreilly.com.
The translation between hostnames and IP addresses is handled by the Domain Name System (DNS). For example, when you type a hostname into a browser as part of a URL, the browser converts the name into the corresponding IP address and then uses that to communicate with the web server. The browser queries a DNS server on the network, which looks up the name in its database and returns the numeric address to the browser.
In its simplest form, a DNS server consists of two tables of data and the software necessary to interrogate them. The first table is a list of hostnames and the IP addresses to which they correspond. The second is a list of IP addresses and the hostnames to which they map. Storing the addresses of all computers on the Internet on every server is not practical, so DNS distributes the data across many thousands of servers around the world. If a DNS server receives a query for a hostname that it does not carry data for, it forwards the query to other servers until it finds one that can answer the request. Certain servers are
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Addresses on the Internet
Each computer on the Internet has a unique identifier in the form of its Internet Protocol (IP) address. This is a 32-bit integer, which we normally represent as four 8-bit integers separated by periods, such as 208.12.16.5.
Numeric addresses are fine for systems administrators who need to set up networks and who like that sort of thing. But for most people, they are impossible to remember and so we have real names for computers, the hostnames that we are all familiar with, such as www.oreilly.com.
The translation between hostnames and IP addresses is handled by the Domain Name System (DNS). For example, when you type a hostname into a browser as part of a URL, the browser converts the name into the corresponding IP address and then uses that to communicate with the web server. The browser queries a DNS server on the network, which looks up the name in its database and returns the numeric address to the browser.
In its simplest form, a DNS server consists of two tables of data and the software necessary to interrogate them. The first table is a list of hostnames and the IP addresses to which they correspond. The second is a list of IP addresses and the hostnames to which they map. Storing the addresses of all computers on the Internet on every server is not practical, so DNS distributes the data across many thousands of servers around the world. If a DNS server receives a query for a hostname that it does not carry data for, it forwards the query to other servers until it finds one that can answer the request. Certain servers are authoritative for particular domains, meaning that they are the ultimate reference for mappings between certain sets of names and numbers. What goes on behind the scenes of DNS can become very complex, especially where the networks of large companies are involved.
I can only scratch the surface of the topic here, but for more information you might consider the books DNS and BIND by Paul Albitz and Cricket Liu and DNS and Bind Cookbook
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Internet Address Tools
Three tools play essential roles in helping us query the databases and names and numbers as well as explore the structure of the network around those machines. dig , whois, and traceroute are all included in standard Unix and Mac OS X distributions. Windows users will find variants of all of these, available for free or as shareware. Unfortunately there are so many of these that it is hard to make any specific recommendations. Look them up on your favorite search engine and try a few of them out. Web page interfaces to the tools can also be found on a number of sites.
dig (domain information groper) is a DNS lookup utility that I will use extensively in the course of this book. dig can help you find the IP address for a given hostname and the hostname, if any, for a given IP address.
You may already be familiar with a similar tool called nslookup . A precursor of dig, its use is now discouraged, even though it is still included in most Unix distributions. The same applies to host, which is also widely available. You may find that you prefer the command syntax or output format of one tool over another. I am only going to describe dig in detail here.

Section 2.2.1.1: Hostname lookups

In its simplest form, dig will get the IP address for the supplied hostname. Here is a typical example:
  1      % dig www.craic.com
  2      ; <<>> DiG 9.2.3 <<>> www.craic.com
  3      ;; global options:  printcmd
  4      ;; Got answer:
  5      ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57325
  6      ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 1
  7
  8      ;; QUESTION SECTION:
  9      ;www.craic.com.                 IN      A
 10
 11      ;; ANSWER SECTION:
 12      www.craic.com.          600     IN      A       208.12.16.5
 13
 14      ;; AUTHORITY SECTION:
 15      craic.com.              600     IN      NS      dns3.seanet.com.
 16      craic.com.              600     IN      NS      dns1.seanet.com.
 17      craic.com.              600     IN      NS      dns2.seanet.com.
 18
 19      ;; ADDITIONAL SECTION:
 20      dns3.seanet.com.        82411   IN      A       199.181.164.3
 21
 22      ;; Query time: 98 msec
 23      ;; SERVER: 192.168.2.18#53(192.168.2.18)
 24      ;; WHEN: Fri Jan  7 14:16:07 2005
 25      ;; MSG SIZE  rcvd: 127
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
DNS Record Manipulation
The DNS infrastructure of the Internet plays a critical role in resolving host and domain names into IP addresses. A great deal of effort has gone into ensuring that DNS works efficiently and is resilient in the face of server failures, incorrect data, or malicious attempts to disrupt the system. But even with these safeguards in place, the system is still subject to attack.
The potential benefit for someone involved in Internet fraud is huge. If you can change the DNS records for a major bank so that they point to your fake site, then you can potentially capture the account numbers and passwords of anyone who logs into the system. This approach sidesteps the need to send out email messages that try to get users to log in, but it does require a high level of technical sophistication. Two approaches have been used: DNS Poisoning and Pharming .
DNS servers around the Internet keep their tables updated by querying other more authoritative servers. The structure is a hierarchy with the network root servers at its origin. In a DNS poisoning attack, DNS servers are manipulated to fetch updated, incorrect DNS records from a server that has been set up by the attacker. This is a sophisticated type of attack to which modern DNS servers are largely immune. But successful attacks do still take place, usually by exploiting bugs in the server software. In March 2005, the SANS Internet Storm Center reported one such attack in which users were redirected to sites that contained spyware, which was then downloaded to users' computers. A detailed report on this attack can be found at http://isc.sans.org/presentations/dnspoisoning.php.
Pharming is somewhat of an umbrella term for several different approaches to manipulating DNS records. Rather than going after DNS servers directly, an attacker may try to con a domain registrar into changing the authoritative DNS record for a domain to point to their fake site. Examples of this form of social engineering have included someone simply calling a registrar on the phone and persuading them that they represent the owner of the target domain.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
An Example—Dissecting a Spam Network
Now let's see how these tools can be used in the real world. This section shows how you can figure out the structure of a sophisticated spam operation. A point that I will stress here and throughout the book is how valuable it can be to have multiple examples of an email or a web site. Even though the details may differ, the similarities between them can be very revealing.
For a while last year I was getting a lot of spam emails that all had a similar underlying appearance. The products being offered varied, as did the name of the Sender, but they clearly had a common origin. The From addresses all had the form <somebody>@stderr.<somedomain>.com and they all had the same mechanism for unsubscribing from their mailing list. So I collected a bunch of messages that fit this pattern and made a list of the web sites they were directing me to. At first glance these seemed to be a diverse group but as I added more examples the domain names started to take on a similar form. That was my motivation to investigate further and start to run dig on the hostnames. Table 2-3 shows a small sample of the results from that survey, sorted by IP address.
Table 2-3: Hostnames with similar IP addresses
Hostname
IP address
adv3.pureadvances.com
66.111.233.138
adv4.pureadvances.com
66.111.233.139
gold4.goldenbeachexlusives.com
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Email
The vast majority of the scams that you might want to investigate are initiated by an email message. So it is only natural that these messages are a major target for forensic analysis. In this chapter, I will show you how to dissect message headers and distinguish between the real and forged information contained therein. I will show how you go about tracking back spam to its source and the approaches that spammers use to make that as difficult as possible. Then I will move on to the contents of email messages and show how you can safely extract attachments that may contain viruses or spyware.
The content of an email message is what first gets our attention but, in terms of forensics, the header block is the most interesting. Every message contains a series of header lines that instruct mail servers where to deliver it, tell mail readers how to process its content, and provide a record of the path taken by the message from its source to its destination. One reference on headers is RFC 2076 (Common Internet Message Headers), which can be found at http://rfc.net/rfc2076.html, but, as you will see, there is considerable variation in their format.
The fundamental flaw with email is that certain headers can be forged. This is what allows spam and all the other scams to flourish, even in the face of sophisticated filters and detection software. In looking at messages that are of interest to you, you need to understand what header information can be forged and what you can rely on. Let's start by looking at the headers for a simple, legitimate message. The following is an email sent from my machine to a Gmail account at Google. I have deleted a few of the Gmail-specific headers and modified the addresses to protect privacy.
    Delivered-To: XYZ@gmail.com
    Return-Path: <ABC@craic.com>
    Received: by 10.54.18.32 with SMTP id 32cs2945wrr;
            Fri, 25 Feb 2005 15:27:07 -0800 (PST)
    Received: by 10.54.7.40 with SMTP id 40mr65062wrg;
            Fri, 25 Feb 2005 15:27:05 -0800 (PST)
    Received: from gateway.craic.com
            (gateway.craic.com [208.12.16.5])
            by mx.gmail.com
            with ESMTP id 9si124319wrl.2005.02.25.15.26.58;
            Fri, 25 Feb 2005 15:27:04 -0800 (PST)
    Received: from [192.168.2.7] (nexus.craic.com [208.12.16.2])
            by gateway.craic.com (8.11.6/8.11.6)
            with ESMTP id j1PNQvl31568
            for <XYZ@gmail.com>;
            Fri, 25 Feb 2005 15:26:58 -0800
    Message-ID: <421FB441.8030406@craic.com>
    Date: Fri, 25 Feb 2005 15:26:57 -0800
    From: ABC <ABC@craic.com>
    User-Agent: Mozilla Thunderbird 0.9 (X11/20041103)
    X-Accept-Language: en-us, en
    MIME-Version: 1.0
    To: XYZ@gmail.com
    Subject: Test
    Content-Type: text/plain; charset=ISO-8859-1; format=flowed
    Content-Transfer-Encoding: 7bit

    This is a test
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Message Headers
The content of an email message is what first gets our attention but, in terms of forensics, the header block is the most interesting. Every message contains a series of header lines that instruct mail servers where to deliver it, tell mail readers how to process its content, and provide a record of the path taken by the message from its source to its destination. One reference on headers is RFC 2076 (Common Internet Message Headers), which can be found at http://rfc.net/rfc2076.html, but, as you will see, there is considerable variation in their format.
The fundamental flaw with email is that certain headers can be forged. This is what allows spam and all the other scams to flourish, even in the face of sophisticated filters and detection software. In looking at messages that are of interest to you, you need to understand what header information can be forged and what you can rely on. Let's start by looking at the headers for a simple, legitimate message. The following is an email sent from my machine to a Gmail account at Google. I have deleted a few of the Gmail-specific headers and modified the addresses to protect privacy.
    Delivered-To: XYZ@gmail.com
    Return-Path: <ABC@craic.com>
    Received: by 10.54.18.32 with SMTP id 32cs2945wrr;
            Fri, 25 Feb 2005 15:27:07 -0800 (PST)
    Received: by 10.54.7.40 with SMTP id 40mr65062wrg;
            Fri, 25 Feb 2005 15:27:05 -0800 (PST)
    Received: from gateway.craic.com
            (gateway.craic.com [208.12.16.5])
            by mx.gmail.com
            with ESMTP id 9si124319wrl.2005.02.25.15.26.58;
            Fri, 25 Feb 2005 15:27:04 -0800 (PST)
    Received: from [192.168.2.7] (nexus.craic.com [208.12.16.2])
            by gateway.craic.com (8.11.6/8.11.6)
            with ESMTP id j1PNQvl31568
            for <XYZ@gmail.com>;
            Fri, 25 Feb 2005 15:26:58 -0800
    Message-ID: <421FB441.8030406@craic.com>
    Date: Fri, 25 Feb 2005 15:26:57 -0800
    From: ABC <ABC@craic.com>
    User-Agent: Mozilla Thunderbird 0.9 (X11/20041103)
    X-Accept-Language: en-us, en
    MIME-Version: 1.0
    To: XYZ@gmail.com
    Subject: Test
    Content-Type: text/plain; charset=ISO-8859-1; format=flowed
    Content-Transfer-Encoding: 7bit

    This is a test
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Forged Headers
Now consider an example where the headers have been forged to make the message appear to come from another source. The following headers are taken from a message that purported to come from the FBI, telling me that I had been visiting illegal web sites. In fact, the message contained a virus and was sent from an infected computer.
    Return-Path: <Web@fbi.gov>
    Received: from nvwyu.gov (i528C1073.versanet.de [82.140.16.115])
            by gateway.craic.com (8.11.6/8.11.6)
            with SMTP id j1R0aU702669
            for <XYZ@craic.com>; Sat, 26 Feb 2005 16:36:30 -0800
    From: Web@fbi.gov
    To: XYZ@craic.com
    Date: Sat, 26 Feb 2005 23:17:43 GMT
    Subject: You visit illegal websites
    Message-ID: <dea28bde431c7ce0c@fbi.gov>
    [...]
At face value, this looks like a message from the FBI with the From, Return-Path, and Message-ID headers all referring to the domain fbi.gov. But the single Received header tells a different story. The message was received by gateway and because I control this machine, I trust it to report the correct IP address of the sending MTA. The hostname within the parentheses is the result of a DNS lookup by my server, so I also trust this. This is clearly not an FBI host. The domain is owned by an ISP located in Germany, and the alphanumeric string used as the hostname (i528C1073) has the look of an address assigned to an subscriber's computer, most likely at home. Preceding the parentheses is a fictitious domain, nvwyu.gov, which has been created by the sender.
This illustrates how some email headers are easy to forge whereas certain others, generated by trusted servers, can be relied upon. Being able to distinguish between the two is an important skill.
Because the message was generated by a virus infection somewhere on the Internet, there was no need for the originator to hide the identity of the machine that sent the message. Additionally, only one step was necessary to deliver the message, making it impossible to disguise the path it took. Things are very different in the case of spam, where there is perhaps a single source for the messages and the sender really wants to remain incognito. Here are the headers for a piece of spam that touts a pornographic web site:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Forging Your Own Headers
There are good reasons why you might want to forge the headers of your own messages. I have several scripts that run as root and send out notification emails whenever certain events take place. I don't want people replying to root, so I forge the From address to either my address or that of the recipient. This is a useful technique that illustrates just how easy it is to generate spam.
You can try this for yourself using sendmail on a Unix system. Regular mail clients like Outlook and Thunderbird are not set up to do this. Start by writing a simple message to yourself in a file using an editor. Put your address in the To line and set the From line to whatever you like. In this example, I am going to impersonate someone at O'Reilly. Add a Reply-To header and even make up your own Message-Id. For example:
    To: XYZ@craic.com
    From: ABC@oreilly.com
    Reply-To: ABC@oreilly.com
    Message-Id: <12345678@oreilly.com>
    Subject: Test
    Hello World
Tell sendmail to read those headers from the file rather than the command line by giving it the -t flag.
            % /usr/lib/sendmail -t < test_message
         
The message as received should look similar to this:
    Return-Path: <root@biotech.craic.com>
    Received: from biotech.craic.com (biotech.craic.com [208.12.16.3])
            by gateway.craic.com (8.11.6/8.11.6)
            with ESMTP id j21NSQ721278
            for <XYZ@craic.com>; Tue, 1 Mar 2005 15:28:26 -0800
    Date: Tue, 1 Mar 2005 15:28:21 -0800
    Reply-To: ABC@oreilly.com
    Message-Id: <12345678@oreilly.com>
    To: XYZ@craic.com
    From: ABC@oreilly.com
    Subject: Test

    Hello World
While this looks totally convincing when viewed in a mail client, the headers still show the correct Return-Path and hostname for the sender. You can fix the first of these problems by specifying the From address as a command-line option, thus:
            % /usr/lib/sendmail -t -fABC@oreilly.com < test_message
         
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Tracking the Spammer
Before you take this newfound knowledge and start your own spam empire, bear in mind that spammers are being identified and prosecuted with increasing success. How are the authorities able to track these people down?
What they have that you and I do not is access to the ISPs. Starting with an individual spam message, they can slowly but surely work their way back via the mail server logs at multiple ISPs to identify the original source. It is laborious work, justifying to each ISP that they need to provide access to their logs, search them, document the evidence, and then move one more step back through the chain. That effort goes up by at least an order of magnitude every time the delivery route includes a server in a foreign country. Often that will stop an investigation in its tracks—a fact that has not gone unnoticed by the professional spammers.
sendmail, as well as most other MTAs, can be configured to record information about the messages it handles in log files . The default level of logging in sendmail captures pretty much the same information as the Received headers in the messages themselves. But there is much less opportunity for forgery in these logs, at least as long as the server has not been compromised. More importantly, by examining log files, we might be able to discover groups of related messages being transferred at the same time, indicative of a coordinated spam campaign rather than a single unsolicited message. Distinctions like this are very important in legal proceedings related to spam.
By way of an example, consider the MTA log entries that relate to the forged email that we just created in the previous section. We begin on gateway, the MTA that received the delivered message. A typical location for these log files on a Unix or Mac OS X system is /var/log. We can use the message ID generated on that server to find the matching records.
            % grep j21Mui721208 /var/log/maillog
    Mar  1 14:56:44 gateway sendmail[21208]: j21Mui721208:
         from=<ABC@oreilly.com>, size=286, class=0, nrcpts=1,
         msgid=<12345678@oreilly.com>, proto=ESMTP, daemon=MTA,
         relay=biotech.craic.com [208.12.16.3]
    Mar  1 14:56:44 gateway sendmail[21209]: j21Mui721208:
         to=<XYZ@craic.com>, delay=00:00:00, xdelay=00:00:00,
         mailer=local, pri=30022, dsn=2.0.0, stat=Sent
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Viruses, Worms, and Spam
In some cases, the spammers have been able to hijack the computers of unsuspecting users on the Internet, either by a targeted attack or through virus infections. The Sobig series of worms are widely believed to be an example of this. These are a family of worms that were disseminated across the Internet beginning in 2003. They showed a clear evolution in their design from the first (Sobig.A) through the sixth (Sobig.F), in terms of their ability to sidestep the defenses that were quickly raised against them. That evolution also appears to reflect improvements in the secondary function for the worm, which was to install email proxy servers on infected computers.
Having access to a network of these proxy servers is of great value to the spammers. Not only do they greatly reduce the chance that their identity will be revealed, but by constantly switching between proxies, they can prevent their emails being rejected by the spam blacklist servers. These keep track of machines that have sent large amounts of spam. If any given machine sends only a small number of messages, then it will never be blacklisted.
The evolution of Sobig through its fifth incarnation is summarized nicely in a report by the LURHQ Threat Intelligence Group , which can be found at http://www.lurhq.com/sobig-e.html. For a more detailed technical analysis, written by a group of analysts who have chosen to remain anonymous, you might find this document of interest: http://spamkings.oreilly.com/WhoWroteSobig.pdf. It offers a fascinating insight into the world of virus tracking and even names the individual that the authors believe created the worm.
The networks of compromised machines have been termed Botnets , with individual computers called zombies or bots . Their implications for computer security go beyond spamming to include distributed denial-of-service attacks on target machines and networks. The Honeynet Project and Research Alliance have published a detailed whitepaper about Botnets (
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Message Attachments
While the direct content of a message is displayed clearly in our mail readers, to be read or deleted as we see fit, an attachment poses a dilemma. We cannot easily determine its contents without examining it, but that process alone can expose us to any computer virus that it might contain. This section will explain how you can safely extract the contents of a suspicious attachment and determine their function. Consider this email as an example:
    From: support@symantec.com
    To: XYZ@craic.com
    Subject: Re: Submit a Virus Sample
    Date: Sat, 15 Jan 2005 23:58:39 +0800

    The sample file you sent contains a new virus version of mydoom.j.
    Please clean your system with the attached signature.

    Sincerly,
     Robert Ferrew

    +++ Attachment: No Virus found
    +++ MessageLabs AntiVirus - www.messagelabs.com
Although that sounds vaguely convincing, I'm not going to trust an email from an antivirus company, Symantec, which appears to screen its messages with software from its competitor, MessageLabs. We can assume that the attached file, datfiles.zip, contains a virus or something equally nasty. How can we isolate the payload and figure out what it represents?
It should go without saying that you should not attempt any extraction or analysis of viruses, worms, or spyware on any Windows system.
On a Unix system, download the entire email message into a new directory and look at the text. Here are the relevant lines from our example. It has three parts: the mail headers, the text of the message, and a large block of encoded text.
    From: support@symantec.com
    To: XYZ@craic.com
    Subject: Re: Submit a Virus Sample
    Date: Sat, 15 Jan 2005 23:58:39 +0800
    MIME-Version: 1.0
    Content-Type: multipart/mixed;
            boundary="----=_NextPart_000_0016----=_NextPart_000_0016"

    This is a multi-part message in MIME format.

    ------=_NextPart_000_0016----=_NextPart_000_0016
    Content-Type: text/plain;
            charset="Windows-1252"
    Content-Transfer-Encoding: 7bit

    The sample file you sent contains a new virus version of mydoom.j.
    [...]

    ------=_NextPart_000_0016----=_NextPart_000_0016
    Content-Type: application/octet-stream;
            name="datfiles.zip"
    Content-Transfer-Encoding: base64
    Content-Disposition: attachment;
            filename="datfiles.zip"

    UEsDBAoAAAAAAEtqLzKjiB3egHMAAIBzAABTAAAAZG9jdW1lbnQudHh0ICAg
    ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
    [...]
    ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIC5l
    eGVQSwUGAAAAAAEAAQCBAAAA8XMAAAAA

    ------=_NextPart_000_0016----=_NextPart_000_0016--
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Message Content
From a forensics perspective, the content of a message is actually the least interesting part. If the message carries a virus or spyware, then the payload will be contained in the attachment. If it is a phishing attempt, then the web site it links to is where your interest will lie.
The experts in spam analysis and filtering can do a far better job than I at describing the techniques they use to classify messages and decide if they represent spam or not. This is a fascinating area that combines advanced computer science, with its statistical and pattern recognition algorithms, and practical software engineering that builds and deploys tools in an ongoing battle with the spammers.
There are three main approaches to dealing with spam. Here are resources to each of these that you might find useful. Rule-based filtering looks for specific strings and signatures within a message and assigns a score based on the matches it finds. SpamAssassin is a leading open source tool that uses this approach (http://spamassassin.apache.org/). Statistical filtering, using Bayesian analysis, looks at things like word frequencies in sets of messages that have been manually classified as spam or not, typically by the end user. As such it reflects their personal interests and can adapt to changes in the types of email that an individual receives. This is the approach taken in the Thunderbird email client, among others. A good introduction to Bayesian filtering is this paper by Paul Graham: http://www.paulgraham.com/spam.html. If spam can be traced back to a specific network address, then that address can be added to a Block List, or blacklist, of known spammers. A mail server can look up the address of each MTA that wants to transfer a message and automatically reject those that are on the list. This approach will become less effective in the face of proxy servers that were created by the Sobig worms. The Spamhaus Block List is a leading example of this approach, and their web site is an excellent resource: http://www.spamhaus.org
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Is It Really Spam?
The amount of spam that I receive everyday is absurd. All spam is stupid, but some is more stupid than others, and it amazes me how many emails I get from the widows of Sonny Abacha, Yassir Arafat, and various oil company executives, all offering a piece of the action if I help them transfer their millions out of their respective countries. These are the so-called 419 advance payment scams that we are all familiar with. At this point almost everyone on the planet must know about the scam and so you would think this type of email would be on the decline. But I seem to get more of them every day. Perhaps there is more to it than meets the eye.
One theory is that some of these are not spam at all. Embedded within their usual colorful prose are hidden messages that will only be noticed by those who know where to look. The rest of us will treat the emails as spam and ignore them.
In principle, it's a simple and effective way to broadcast secret messages to members of a criminal gang or terrorist group. Anyone monitoring Internet traffic, even if they focused on emails received by a single address, would find it difficult to distinguish one piece of fake spam from the torrent of real spam that many of us receive every day. Even having achieved that, it would be impossible to identify the intended recipient among the thousands of other people who received the same message.
Spy novels from the Cold War era were full of agents passing messages to one another via cryptic classified ads in the back pages of the Times. Fake spam could well be the modern equivalent.
The ways in which a secret message could be embedded in an email are countless. The message ID string could represent a phone number. The first letters of each line could form a sentence. The pixels of a photograph could contain hidden text. These are all examples of steganography , an approach to hiding information in plain sight that has been used since the days of ancient Greece. Whereas encryption makes the content of a message unreadable to everyone but the sender and the recipient, steganography hides the message within a larger block of information. The two approaches are complementary. Steganography has received a lot of attention in recent years as a way to embed information within photographs or audio tracks. For example, it is possible to change the low order bits of pixels in a photograph with no noticeable impact on the image quality. Algorithms exist that embed a message throughout the image and that can extract the message at a later date from a copy of the image, or even a fragment thereof in certain cases. The hidden message can represent a copyright statement and be used to track the illegal copying of images.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Obfuscation
The Achilles' heel of any Internet con artist is the web site they use to trick their victims. In order for the scam to function, victims have to be able to access a real site at a defined location on the Internet. But revealing that address opens the door for investigators, leading to their sites being shut down and perhaps to their true identities being discovered.
The bad guys are very aware of the problem and go to great lengths to disguise, or obfuscate, their real addresses in the vain hope that investigators will be fooled or become frustrated and give up the pursuit.
On top of that, spam-blocking software is making it increasingly difficult for their emails to get through to our mailboxes. Anything that can disguise an address and avoid it being added to a spam blacklist will extend the life of a scam—so spammers will use every trick in the book.
It's a bit like an arms race, with pressure from our side forcing them to innovate and come up with new tricks. Fortunately for us, implicit in any form of obfuscation is the fact that browsers must be able to reveal the true URL in order to use it. If the browser can do it, so can we. This chapter covers a variety of tricks, some of them quite elegant, that scammers use to throw us off the scent of their trail.
The developers of Internet browsers are continually updating their software to address security exploits, including some of the tricks described here. As a result, with any given browser, some tricks will work and others will not. In due course, you can expect that many will be completely blocked. But these things have a way of reappearing in different contexts, so I will describe the complete menagerie.
Here are a few examples of URLs that illustrate the problem:
  • http://www.craic.com
  • http://208.12.16.5
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Anatomy of a URL
Here are a few examples of URLs that illustrate the problem:
  • http://www.craic.com
  • http://208.12.16.5
  • http://%77%77%77%2e%63%72%61%69%63%2e%63%6f%6d
  • http://www.oreilly.com@www.craic.com
All of these take you to my web site, but only the first one is recognizable by the casual user. Most of these variants use the more arcane features of the URL specification, so I will start with a brief review of that. The general syntax of a URL is as follows:
<protocol>://<user>:<password>@<host>:<port>/<url-path>
This can be simplified to produce something that looks almost familiar:
<protocol>://<host>/<url-path>
<protocol>
This notation refers to the network protocol being invoked to transfer data back and forth. This is usually the hypertext transfer protocol (http) but other options include https, ftp, file, and mailto.
<host>
The address of the web server, represented as a fully qualified domain name (FQDN), such www.craic.com, or a numeric IP address, such as 208.12.16.5.
<url-path>
The path to a specific file or directory on that web server.
<port>
This allows you specify the TCP/IP port to use in the http transaction. The default port is 80, but you sometimes see other ports specified, such as 8080.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
IP Addresses in URLs
We expect a URL to include the hostname of a web server but we can just as easily use the numeric IP address in its place. http://208.12.16.5 and http://www.craic.com are completely equivalent. But most people don't remember the IP address of their own computer, let alone one for eBay or Citibank. Most people tend to assume that an IP address is valid, whereas a false hostname is more likely to arouse suspicion. Scammers exploit this and often use IP addresses in their URLs.
There is a second, perhaps more valuable, benefit to this approach. You can set up an account with an ISP, be assigned an IP address, and set up a web server without having registered a domain name. It makes it harder for people to find you, but because you are including the URL in your spam, that is not a problem. In fact, it is a significant advantage.
Here are a few examples:
  • http://202.87.128.138/sys/index.php