O'Reilly logo

Perl for Web Site Management by John Callender

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Putting It All Together

Let’s take stock of what we’ve done so far. We’ve written a script that will descend recursively through a filesystem, reading in the contents of any HTML files it encounters and extracting all the <A HREF="..."> and <IMG SRC="..."> attributes from those files. We’ve also created a subroutine that will take a directory name and a list of links extracted from a file in that directory, identify which links point to local files, and convert them to full (that is, absolute) filesystem pathnames.

The fast-but-stupid version of our link-checker is almost finished. The main thing left is defining the data structure that will hold the information on the bad links it discovers.

For that, we go back to the top of the script, just below the configuration section, and add the following:

my %bad_links;    # A "hash of arrays" with keys consisting of URLs
                  # under $start_base, and values consisting of lists 
                  # of bad links on those pages.

my %good;         # A hash mapping filesystem paths to
                  # 0 or 1 (for good or bad). Used to cache the results
                  # of previous checks so they needn't be repeated for
                  # subsequent pages.

Here we’ve declared two new hashes that are going to be used in our script: %bad_links and %good . %good is fairly straightforward; we’re going to use it to store the result of testing the links our script processes. The keys of the %good hash are the local filesystem paths for the files we are checking (e.g., /w1/s/socalsail/index.html). A link that turns out to be bad ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required