Server Log Analysis
Individual log records can be revealing but often even greater insights come from looking through access logs over a period of time and finding patterns in the data. There is a whole industry devoted to log analysis of large sites involved in news or e-commerce, trying to assess what visitors are most interested in, where they are coming from, how the server performs under load, and so on. I’m going to take a much simpler approach and use the tools that I have at hand to uncover some very interesting needles hidden in my haystack. Hopefully these examples will inspire you to take a closer look at your own server logs.
Googlebot Visits
Given that Google is such a powerful player in the field of
Internet search, you might like to know how often they update their
index of your site. To see how often their web robot, or spider, pays
you a visit, simply search through the access log looking for a
User-Agent
called GoogleBot. Do this using the standard Unix
command grep
:
% grep -i googlebot access_log | grep 'GET / ' | more
The first grep
gets all
GoogleBot page visits and the second limits the output to the first
page of each site visit. Here is a sample of the output from my
site:
66.249.71.9 - - [01/Feb/2005:22:33:27 -0800] "GET / HTTP/1.0" 304 - "-" "Googlebot/2.1 (+http://www.google.com/bot.html)" 66.249.71.14 - - [02/Feb/2005:21:11:30 -0800] "GET / HTTP/1.0" 304 - "-" "Googlebot/2.1 (+http://www.google.com/bot.html)" 66.249.64.54 - - [03/Feb/2005:22:39:17 -0800] ...
Get Internet Forensics now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.