book

Perl for Web Site Management

Name: Perl for Web Site Management
Author: John Callender
ISBN: 9781565926479

by John Callender

October 2001

Beginner

528 pages

15h 20m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Perl for Web Site Management
Preface
Intended Audience
Programmers by Accident
What This Book Offers
Organization
Online Examples
Conventions Used in This Book
How to Contact Us
Acknowledgments
1. Getting Your Tools in Order
Open Source Versus Proprietary Software

Evaluating a Hosting Provider
Web Hosting Alternatives
Free HostingShared Hosting (Low Grade)Shared Hosting (High Grade)Dedicated Hosting/Co-Location
Getting Started with SSH/Telnet
Meet the Unix Shell
man, more, and lessDirectories and the pwd CommandThe ls Command: List Directory ContentsThe mkdir Command: Make a New DirectoryThe cd Command: Change DirectoriesCTRL-C (^C): Cancel a Command in ProgressThe exit Command: End Your Shell Session
Network Troubleshooting
ping and traceroutemtr
A Suitable Text Editor
2. Getting Started with Perl
Finding Perl on Your System
Creating the “Hello, world!” Script
The Dot Slash Thing
Unix File Permissions
Running (and Debugging) the Script
The Joy of Debugging
Perl Documentation
man perl, perldoc perlFunction Documentation with perldoc -fThe Perl FAQ
Perl Variables
Scalar VariablesArray VariablesHash Variables
A Bit More About Quoting
“Hello, world!” as a CGI Script
Content-Type HeadersHere-Document QuotingFile Locations/Extensions for Running CGI ScriptsTesting from the Command LineTesting from the Web ServerCGI Script File Permissions
3. Running a Form-to-Email Gateway
Checking for CGI.pm
Creating the HTML Form
The <FORM> Tag’s ACTION Attribute
The mail_form.cgi Script
Warnings via Perl’s -w Switch
The Configuration Section
Invoking CGI.pm
foreach Loops
if Statements
Filehandles and Piped Output
die Statements
Outputting the Message
Testing the Script
4. Power Editing with Perl
Being Careful
Renaming Files
GlobbingA Simple Renaming ScriptSanity CheckingRegular ExpressionsRunning the Renaming Script
Modifying HREF Attributes
First Version of the fix_links.plx ScriptReading from a File with a while LoopModifying Data with a Substitution Operator
Writing the Modified Files Back to Disk
5. Parsing Text Files
The “Dirty Data” Problem
Required Features
Obtaining the Data
Parsing the Data
Using strict and Scoping VariablesUsing the Default Variable $_The push FunctionManaging ComplexitySubroutinesThe &parse_exhibitor Subroutine
Outputting Sample Data
Making the Script Smarter
Parsing the Category File
Testing the Script Again
6. Generating HTML
The Modified make_exhibit.plx Script
Changes to &parse_exhibitor
Adding Categories to the Company Listings
Creating Directories
Generating the HTML Pages
Generating the Individual Company ListingsGenerating the Alphabetical IndexUsing an Explicit Sort BlockGenerating the Category Pages
Generating the Top-level Page
7. Regular Expressions Demystified
Delimiters
Trailing Modifiers
The Search Pattern
Taking It for a Spin
Thinking Like a Computer
Bumping Along and BacktrackingAlternation
8. Parsing Web Access Logs
Log File Structure
Converting IP Addresses
The Log-Analysis Script
The Mammoth Regular Expression
Different Log File Formats
Storing the Data
The “Visit” Data Structure
The &store_line Subroutine
9. Date Arithmetic
Date/Time Conversions
Using the Time::Local Module
Caching Date Conversions
Scoping via Anonymous Blocks
Using a BEGIN Block
10. Generating a Web Access Report
The &new_visit and &add_to_visit Subroutines
Generating the Report
Generating the Summary LineSaving Previous Summary Lines
Showing the Details of Each Visit
Reporting the Most Popular Pages
Fancier Sorting
Reporting the Referral and User Agent InformationTracking Robots
Mailing the Report
Using cron
11. Link Checking
Maintaining Links
Finding Files with File::Find
The Magic of ReferencesFinding HTML Files Only
Looking for Links
Extracting
Converting
Putting It All Together
Creating a Hash of ArraysUpdating &process to Store Bad-Link DataPrinting the Bad-Link ReportAdding HTML Output
Using CPAN
Checking for LWPInstalling LWP from CPANGetting the archive file onto the web serverDecompressing the fileExtracting the files from the archiveThe actual installationRoot Versus Regular User Installation
Checking Remote Links
A Proper Link Checker
Object-Oriented SyntaxChecking Remote URLsProcessing the Queue
12. Running a CGI Guestbook
The Guestbook Script
Taint Mode
Guestbook Preliminaries
Untainting with Backreferences
File Locking
Guestbook File Permissions
13. Running a CGI Search Tool
Downloading and Compiling SWISH-E
Indexing with SWISH-E
Running SWISH-E from the Command Line
Running SWISH-E via a CGI Script
14. Using HTML Templates
Using Templates
Reading Fillings Back In
Rewriting an Entire Site
15. Generating Links
The Docbase Concept
The CyberFair Site’s Architecture
The Script’s Data Structure
Using Data::Dumper
Creating Anonymous Hashes and Arrays
Automatically Generating Links
Inserting the Links
16. Writing Perl Modules
A Simple Module Template
Installing the Module
The Cyberfair::Page Module
17. Adding Pages via CGI Script
Why Add Pages with a CGI Script?
A Script for Creating HTML Documents
Controlling a Multistage CGI Script
Using Parameterized Links
Building a Form
Posting Pages from the CGI Script
Running External Commands with system and Backticks
Race Conditions
File Locking
Adding Link Checking
18. Monitoring Search Engine Positioning
Installing WWW::Search
A Single-Search Results Tool
Using the Getopt::Std ModuleUsing || for Short-Circuit Assignment
A Multisearch Results Tool
The map Function
19. Keeping Track of Users
Stateless Transactions
Identifying Individual Users
Basic Authentication
The .htaccess FileThe .htgroup and .htpasswd Files
Automating User Registration
Storing Data on the Server
Flat Text Files for Data StorageSerializing DataUpdating the .htpasswd and .htgroup Files
The Register Script
Fixing a Race ConditionGenerating a Random Verification StringArray and Hash SlicesThe rand Function
The Verification Script
20. Storing Data in DBM Files
Data Storage Options
The tie Function
A DBM Example Script
Blocking Versus Nonblocking Behavior
Storing Multilevel Data in DBM Files
An MLDBM-Using Registration Script
An MLDBM-Using Verification Script
21. Where to Go Next
Unix System Administration
Programming
A Programmer’s EditorRevision-Control SystemsMore PerlJavaScriptPHPEmbperlPythonOther Languages
Apache Server Administration and mod_perl
Relational Databases
Advocacy
Index
Colophon

Content preview from Perl for Web Site Management

Chapter 8. Parsing Web Access Logs

Web server access logs are an excellent source of information about what your site’s visitors are up to. The information on separate visitors is all mixed together, though, and for all but the smallest sites the raw access logs are too large to read directly. What you need is log analysis software to make the information in the log more easily accessible. You can buy commercial log analysis software to do this, but Perl makes it easy to write your own. The next three chapters describe how to build such a home-grown log-analysis tool.

This chapter focuses on the first part of the process: extracting and storing the information we’re interested in. We talk about log file structure, converting IP addresses, and creating regular expressions capable of parsing web access logs. We also talk about creating a suitable data structure for storing the extracted data, so we can answer interesting questions about what our site’s visitors have been doing. Along the way we discuss the difficulty of identifying those visitors in the web server’s log entries and devise an approach for extracting at least an approximate version of that information.

The example continues in Chapter 9, which focuses on how to do computations involving dates and times, and finishes in Chapter 10, which covers the specifics of how we manipulate the “visit” information from our logs, as well as the actual output of the finished report.

Log File Structure

Most web servers store their ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 1565926471Catalog Page Errata

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Perl for Web Site Management

by John Callender

Chapter 8. Parsing Web Access Logs

Log File Structure

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

More than 5,000 organizations count on O’Reilly

Julian F.

Addison B.

Amir M.

Mark W.

You might also like

Web Client Programming with Perl

Embedding Perl in HTML with Mason

PHP and MySQL for Dynamic Web Sites: Visual QuickPro Guide, Fourth Edition

PHP and MySQL for Dynamic Web Sites: Visual QuickPro Guide, Fifth Edition

Publisher Resources