Deconstruct Web Server Logfiles

The history of web site measurement is, for the most part, the history of web server logfiles. Understanding the data logfiles provide and their limitations will help you better plan for their use.

Web measurement got its start over 10 years ago with simple log analysis tools. These early tools did little more than scan the logfiles produced by web servers to count hits and visits, report on server errors and page load times, and process other data pertinent to early site administrators.

Anatomy of a Web Server Logfile

Generally speaking, each entry in the logfile will contain the IP address of the requesting client, the requested URL, the number of bytes transferred to the client, the date/time of the request, the URL from the which the request was made (also called the referring URL [Hack #1] ), and much more. The log will not only contain each explicitly requested page (commonly a file with an extension of HTM, HTML, ASP, or JSP), but also each image (e.g., GIF and JPG), JavaScript file (JS), and other objects needed to complete the loading of the page. Not surprisingly, logfiles can get excessively large [Hack #19] .

Using the following sample line from the author’s web server logfile, let’s step through the fields captured in the combined log format (see below for more formats). - elvis [15/May/2000:23:03:36 -0800] "GET /index.htm HTTP/1. 0" 200 956 "" "Mozilla/2.0 (compatible; MSIE4.0; ...

Get Web Site Measurement Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.