O'Reilly logo

Web, Graphics & Perl/Tk Programming by Jon Orwant

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 15. A Web Spider in One Line

Tkil

One day, someone on the IRC #perl channel was asking some confused questions. We finally managed to figure out that he was trying to write a web robot, or “spider,” in Perl. Which is a grand idea, except that:

  1. Perfectly good spiders have already been written and are freely available at http://info.webcrawler.com/mak/projects/robots/robots.html.

  2. A Perl-based web spider is probably not an ideal project for novice Perl programmers. They should work their way up to it.

Having said that, I immediately pictured a one-line Perl robot. It wouldn’t do much, but it would be amusing. After a few abortive attempts, I ended up with this monster, which requires Perl 5.005. I’ve split it onto separate lines for easier reading.

perl -MLWP::UserAgent -MHTML::LinkExtor -MURI::URL -lwe '
    $ua = LWP::UserAgent->new;
    while (my $link = shift @ARGV) {
        print STDERR "working on $link";
        HTML::LinkExtor->new(
          sub {
            my ($t, %a) = @_;
            my @links = map { url($_, $link)->abs( ) }
                       grep { defined } @a{qw/href img/};
            print STDERR "+ $_" foreach @links;
            push @ARGV, @links;
          } ) -> parse(
           do {
               my $r = $ua->simple_request
                 (HTTP::Request->new("GET", $link));
               $r->content_type eq "text/html" ? $r->content : "";
        }
     )
  }'http://slinky.scrye.com/~tkil/

I actually edited this on a single line; I use shell-mode inside of Emacs, so it wasn’t that much of a terror. Here’s the one-line version.

perl -MLWP::UserAgent -MHTML::LinkExtor -MURI::URL -lwe '$ua = LWP::UserAgent->new; while (my $link = shift ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required