Finding Files with File::Find

The first step in building our first link checker is to figure out a way for our script to get a list of all the HTML files on our site. Back in Chapter 4, we fed our script a list of filenames on the command line using the shell’s ability to expand wildcard characters. Now, though, we’re going to take a different approach, by using the standard File::Find module. We use it by putting use File::Find into our script, then invoking the module’s find function. This will make it easy to construct a script that processes all the files under a given starting directory, including those in deeper subdirectories.

We’ll start with the simple demonstration script, find_files.plx , shown in Example 11-1. (Like all the examples in this book, you can download it from the book’s web site, at http://www.elanus.net/book/.)

Example 11-1. find_files.plx

#!/usr/bin/perl -w

# find_files.plx

# this script demonstrates the use of the File::Find module.

use strict;
use File::Find;

my $start_dir = shift
    or die "Usage: $0 <start_dir>\n";

unless (-d $start_dir) {
    die "Start directory '$start_dir' is not a directory.\n";
}

find(\&process, $start_dir);

sub process {

    # this is invoked by File::Find's find function for each
    # file it recursively finds.

    print "Found $File::Find::name\n";
}

Most of this script should look pretty straightforward at this point. It starts off by shifting off the first item in @ARGV (that is, the first argument supplied to the script when it was invoked ...

Get Perl for Web Site Management now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.