Sometimes it’s nice to visit web sites without being in front of your computer. Maybe you’d prefer to have the text of web pages mailed to you, or be notified when a web page changes. Or maybe you’d like to download a lot of information from a huge number of web pages (as in the article webpluck), and you don’t want to open them all one by one. Or maybe you’d like to write a robot that scours the web for information. Enter the LWP bundle (sometimes called libwww-perl), which contains two modules that can download web pages for you: LWP::Simple and LWP::UserAgent. LWP is available on CPAN and is introduced in Scripting the Web with LWP.
Dan Gruhl submitted five tiny but exquisite programs to TPJ, all using LWP to automatically download information from a web service. Instead of sprinkling these around various issues as one-liners, I’ve collected all five here with a bit of explanation for each.
The first thing to notice is that all five programs look alike.
Each uses an LWP module (LWP::Simple in the first three, LWP::UserAgent
in the last two) to store the HTML from a web page in Perl’s default
$_. Then they use a series of
s/// substitutions to discard the extraneous HTML.
The remaining text—the part we’re interested in—is displayed on the
screen, although it could nearly as easily have been sent as email with
the various Mail modules on CPAN.