Different Log File Formats
It’s fairly easy to modify this script to accept either the common or the extended log format. We do that by adding a configuration variable near the top of the script that looks like this:
my $log_format = 'common'; # 'common' or 'extended'
Then we modify the part of the script where the regular expression
parsing occurs to include some logic to check that
$log_format
variable, along with a second version
of the regular expression to be used on logs that are in the extended
format:
if ($log_format eq 'common') {
($host, $ident_user, $auth_user, $date, $time, $time_zone, $method, $url, $protocol, $status, $bytes) = /^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.+?) (\S+)" (\S+) (\S+)$/ or next;} elsif ($log_format eq 'extended') {
($host, $ident_user, $auth_user, $date, $time,
$time_zone, $method, $url, $protocol, $status, $bytes,
$referer, $agent) =
/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.+?)
(\S+)" (\S+) (\S+) "([^"]+)" "([^"]+)"$/
or next;
} else {
die "unrecognized log format '$log_format'";
}
I think this probably qualifies as the ugliest block of code in this entire book. This is not the sort of code that anybody wants to have to make sense of more than once, but fortunately, once we get it right, we aren’t likely to need to modify it.
Anyway, you’ll notice that the new regular expression for
extended-format logs has a couple of new chunks at the end, both of
which look like "([^"]+)"
. By now that should ...
Get Perl for Web Site Management now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.