Use command-line tools to get a real-time snapshot of web server activity:
tail
Returns the last part of a file, such as most recent connection entries from the web server logfile
grep
Searches for a pattern in a file, such as specific filenames or error codes from the web server logfile
ps
Reports on the status of web server processes
Almost any decent web hosting account will record connections to your web site in logfiles that you can view and process. A good hosting provider may even help you automate the task of purging the connection records—or log rolling—so the files do not consume your account's disk quota, and give you access to web site statistics software, such as Analog or Urchin, that will generate easy-to-read reports about activity on your web site.
If you're serious about your web site, then you should take advantage of the tools available to you and review web site traffic reports often to understand how visitors get to your site, what's popular, and what's working (or not working). How to look at and use web site traffic reports is covered in Recipe 9.9.
The access and error logs that provide the raw material for traffic reports are constantly updated. Traffic reports themselves, on the other hand, are usually generated less frequently—daily, or even weekly, in some cases. A situation may arise when you can't wait for the next traffic report to be created. You need to get an up-to-the-minute picture of the who, what, and how many of your web site's current activity. Here are some command-line tools you can use to take your web site's pulse.
First, you'll need to find your Apache access and error logfiles. They are usually saved in a separate logs directory and have names like access_log, access.log, or apache.access_log. The error log should be in the same directory with the access log, so once you've found the logs, Telnet into your web server and switch to the logfiles directory.
Now you can watch connections to your web site as they're
handled by Apache with the Unix utility tail
. Assuming your access log is named
access_log, type this command at your Telnet
prompt:
tail -f access_log
Your shell window should be filled with several lines, like this:
128.118.152.116 - - [14/May/2005:12:49:26 -0500] "GET /swgr/index.php HTTP/1.1" 200 29070 "http://daddison.com/index.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" 68.142.250.83 - - [14/May/2005:12:49:30 -0500] "GET /case_studies/cs01.html HTTP/1.0" 200 19604 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT; .NET CLR 1.1.4322)" 165.83.120.231 - - [14/May/2005:12:49:33 -0500] "GET /clients/index.html HTTP/1.1" 301 255 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
Each line indicates the IP number, file requested, and status
of each unique connection, or hit, to your web
site. The -f
flag on the command
tells tail
to show the last 10
lines in the access log, and to echo new lines to the shell window
as they are appended to the file. See for yourself: open a browser
window and, with your shell window still visible, hit a page on your
web site. Your request should be duly noted by tail
.
Going back to the problem in Recipe 1.8 about
automatically updating pages on your site, let's say that your boss
wants to know how many hits to the company's latest news release
have been recorded today. And she can't wait until tomorrow, when a
nice and neat traffic report will be waiting on the site with the
answer. With grep
, you can narrow your focus on the
access log to just see recent requests for a specific file.
At the Telnet prompt to your web server, you can instruct the
grep
utility to search the access
log for the filename of the news release in the content of the
current access log by typing this command:
grep "GET /news/newsrelease.html" access_log
With the search string GET
/news/newsrelease.html
you're looking for all the requests
for newsrelease.html in the
/news directory in the current server log. The
results might look like this:
24.91.149.141 - - [14/May/2005:13:55:45 -0500] "GET /news/newsrelease.html HTTP/1.1" 200 18912 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)" 213.219.80.16 - - [14/May/2005:13:56:36 -0500] "GET /news/newsrelease.html HTTP/1.1" 200 18912 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90)" 70.176.205.66 - - [14/May/2005:13:58:09 -0500] "GET /news/newsrelease.html HTTP/1.1" 200 18912 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
You can also send the results of the search to file by modifying the command like this:
grep "newsrelease.html" access_log > newsrelease_report.txt
And if you want to get really fancy, you can put that second
grep
command in your crontab
file, have it run every 15
minutes, and let the boss check the hits herself.
You also can use grep
to
sift the access log for errors and unsuccessful requests that
visitors to your web site are encountering. Each line in the log
also includes an error code indicating the result of the request.
Some common error codes are shown in Table 1-2. For a complete
list, see the World Wide Web Consortium (W3C) list referred to in
the "See Also" section of this Recipe.
Finally, there may come a time when you want to see
what processes are running under your user ID on your web server.
Use the Unix process report utility—
ps
—with this command, replacing userid
with your own ID (right after the
-U
flag):
ps -Uuserid
The results should look something like this, with httpd
indicating Apache processes that are
currently running on your web server:
PID TTY TIME CMD 11565 ? 0:00 httpd 1715 pts/5 0:00 tail 11569 pts/6 0:00 tcsh 11560 ? 0:00 httpd 11567 ? 0:00 sshd 11512 ? 0:00 sh 11542 ? 0:01 httpd 29475 ? 0:01 sshd 29477 pts/5 0:00 tcsh 6373 ? 0:00 sshd 11559 ? 0:00 httpd 11578 pts/6 0:00 ps 11557 ? 0:00 httpd 11553 ? 0:00 httpd 11554 ? 0:00 httpd
For a complete list of HTTP status code definitions, see the W3C page at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.
Get Web Site Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.