Chapter 5. Web Sites

Almost every scam on the Internet today involves a web site, especially those engaged in identity theft. Dissecting the structure of a site is therefore an essential part of Internet forensics. This chapter shows you how to find hidden clues in the HTML code of a single web page and in the architecture of the entire site. First, I cover the basics of looking at the source of web pages using your browser, and then I show how you can use other tools to automate the process of archiving entire web sites. Many of the pages that you will encounter are generated by server-side scripts, and I describe approaches that may reveal some of the inner workings of these, even when you cannot access their source code.

Some clues contribute minor details to our knowledge about the scam. Some enable us to link one scam to another and build a much larger picture. On occasion we get lucky and uncover a mass of detailed information about the operation.

Capturing Web Pages

First, consider individual web pages: the HTML source of a single page can reveal a surprising amount about its creator, and the links contained therein help you map out the structure of the entire site. All web browsers allow you to view the source for a page and to save that to a file on your local computer. While these fundamental operations may seem trivial, there are a couple of important issues of which you need to be aware.

The first is that many of today’s web pages include other files, without which they ...

Get Internet Forensics now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.