O'Reilly logo

Perl for Web Site Management by John Callender

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The “Visit” Data Structure

Trying to track individual visitors via the entries in a web server’s access log is something of an exercise in futility. With things like proxy servers and client-side caching getting in the way, the series of accesses that show up in the log from a particular hostname or IP address can give only an approximate picture of what individual visitors are doing. Multiple users sharing the same IP address can have their activity merged into what looks like a single, very active visitor. Conversely, a single visitor can show up in the logs via a different IP address on each request, defying efforts to abstract those requests into a meaningful “visit.” A proxy server at a major ISP can cache the site’s pages, then satisfy hundreds of requests that never get recorded in the server’s logs.

Even so, it’s hard not to wonder what a log file would reveal if we could pluck out the requests corresponding to specific hosts and string them together to see what patterns emerge. Many users still browse from individual host addresses without intervening proxy servers; for these users, at least, the resulting “visit” tracking provides a fascinating look at the paths being followed through the site. It’s also interesting to see how many incoming requests are actually being generated by robot “spider” programs, and to study the behavior of those programs as they interact with the server. Finally, it’s an interesting programming exercise to see how we can assemble and present ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required