October 2001
Beginner
528 pages
15h 20m
English
Example 11-3 shows
link_check2.plx
,
an enhanced version of the link-checking
script that gives us the option of checking offsite links. The parts of this
script that differ from the previous version have been emphasized.
Example 11-3. Link-checking script with offsite checking
#!/usr/bin/perl -w# link_check2.plx# This is a modified HTML link checker. # It descends recursively from $start_dir, processing # all .htm or .html files to extract HREF and SRC # attributes, then checks all that point to a local# file to confirm that the file actually exists, and optionally# uses LWP::Simple to do a HEAD check on remote ones for the# same purpose. It then reports on the bad links.use strict; use File::Find;use LWP::Simple;# note: the first four configuration variables should *not* # have a trailing slash (/) my $start_dir = '/w1/s/socalsail/expo'; # where to begin looking my $hostname = 'www.socalsail.com'; # this site's hostname my $web_root = '/w1/s/socalsail'; # path to www doc root my $web_path = '/expo'; # web path to $start_dir my $webify = 1; # produce web-ready output?my $check_remote = 1; # check offsite links?my %bad_links; # a "hash of lists" with keys consisting of filenames, # values consisting of lists of bad links in those files my %good; # A hash mapping absolute filenames (or remote URLs) to # 0 or 1 (for good or bad). Used to cache the results of # previous checks. find(\&process, $start_dir); # this loads up the above hashes if ($webify) ...
Read now
Unlock full access