Using CPAN
As I previously pointed out, this first
link-checking script is fairly limited. It only checks links that
point to the local filesystem, and it will be confused by HTML pages
containing things like <BASE HREF="...">
tags, which modify how the relative links on a page are resolved by a
browser. Still, it runs quickly, and on a big site that doesn’t
violate its assumptions it makes short work of checking for at least
the more obvious broken links.
A nice enhancement would be to make it check offsite links as well,
using HTTP to request pages just like a web browser. We could write
our own web browsing code to do this using Perl, but fortunately that
work has already been done, and done better than you or I are likely
to be able to do it. The person responsible for that is a very
helpful member of the extended Perl community named Gisle
Aas, author of
the LWP
module (short for
libwww-perl
).
Using LWP
will save us vast amounts of time and
headache. Since it is not currently included in the standard Perl
distribution, though, we will need to download it from CPAN (the
Comprehensive Perl Archive Network, at http://www.cpan.org/), and install it
(assuming it isn’t already installed as part of the copy of
Perl we are using). Learning to do that will take some initial
effort, but believe me, we’ll be better off in the long run for
having invested that time up front.
Checking for LWP
Before we jump in and start the download-and-install process, make the following quick check to ...
Get Perl for Web Site Management now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.