Looking for Links
Now, let’s use
find_files.plx
as the starting point for a new script,
link_check.plx
, that will do some simplistic checking
for broken links in the HTML files it processes. The first step in
doing that is to modify the
&process
subroutine so that instead of
just printing out the names of the HTML files it processes, it opens
up each one and reads its contents. We can achieve that by modifying
the process subroutine as follows:
sub process {
# this is invoked by File::Find's find function for each
# file it recursively finds.
return unless /\.html$/;
my $file = $File::Find::name;
unless (open IN, $file) {
warn "can't open $file for reading: $!, continuing...\n";
return;
}
my $data = join '', <IN>; # all the data at once
close IN;
return unless $data;
print "found $file, read the following data:\n\n$data\n";
}Looking at the new parts line by line, we can see that the
package variable
$File::Find::name is assigned to a
my variable called $file. This
is just for convenience. Since we’ll be using that variable
several times, typing $file is going to be easier
than typing $File::Find::name over and over again.
Next we open the file for reading, associating the
IN filehandle with it in the
open statement. Notice how we’re using
warn rather than die to check
for failed open operations. The idea here is that
we probably want the script to continue its processing of files even
if something strange happens in the middle and one of them
can’t be opened for reading.
Next ...