Chapter 2. Assembling a Toolbox

Hack 9. Hacks #8-32

The idea behind scraping sites often arises out of pure, immediate, and frantic desire: it’s late at night, you’ve forgotten your son’s soccer game for the twelfth time in a row, and you’re vowing never to let it happen again. Sure, you could place a bookmark to the school calendar in your browser toolbar, but you want something even more insidious, something you couldn’t possibly forget or grow accustomed to seeing.

A bit later, you’ve got a Perl script that automatically emails you every hour of every day that a game is scheduled. You’ve just made your life less forgetful, your computer more useful, and your son more loving. This is where spidering and scraping shines: when you’ve got an itch that can best be scratched by getting your computer involved. And if there’s one programming language that can quickly scratch an itch better than any other, it’s Perl.

Perl is renowned for “making easy things easier and hard things possible,” earning the reputation of “Swiss Army chainsaw,” “Internet duct tape,” or the ultimate “glue language.” Since it’s a scripting language (as opposed to a compiled one, like C), rapid development is its modus operandi; throw together bits and pieces from code here and there, try it out, tweak, hem, haw, and deploy. Along with its immense repository of existing code (see CPAN, the Comprehensive Perl Archive Network, at http://www.cpan.org) and the uncanny ability to “do what you mean,” it’s a perfect language ...

Get Spidering Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.