Mirroring Web Pages
Problem
You want to keep a local copy of a web page up-to-date.
Solution
Use LWP::Simple’s mirror function:
use LWP::Simple; mirror($URL, $local_filename);
Discussion
Although closely related to the get function
discussed in Section 20.1, the
mirror function doesn’t download the file
unconditionally. It adds the
If-Modified-Since
header to the GET request it creates, so
the server will not transfer the file unless it has been updated.
The mirror function mirrors only a single page,
not a full tree. To mirror a set of pages, use this recipe in
conjunction Section 20.3. A good solution to
mirroring an entire remote tree can be found in the w3mir program,
also found on CPAN.
Be careful! It’s possible (and easy) to write programs that run amok and begin downloading all web pages on the net. This is not only poor etiquette, it’s also an infinite task, since some pages are dynamically generated. It could also get you into trouble with someone who doesn’t want their pages downloaded en masse.
See Also
The documentation for the CPAN module LWP::Simple; the HTTP specification at http://www.w3.org/pub/WWW/Protocols/HTTP/