Chapter 10. Scripting the Web with LWP

Lincoln D. Stein

In previous articles I’ve focused on the Web from the server’s point of view. We’ve talked about how the CGI protocol works, how to write server scripts, and how to maintain long-running transactions across the Web. But what about the client side of the story? Does Perl offer any support for those of us who wish to write our own web-creeping robots, remote syntax verifiers, database accessors, or even full-fledged graphical browsers? Naturally it does, and the name of this support is LWP.

LWP (Library for WWW access in Perl), is a collection of modules written by Martijn Koster and Gisle Aas and is available on CPAN. To understand what LWP can do, consider the tasks your average Web browser is called upon to perform:

  • Read and parse a URL

  • Connect to a remote server using the protocol appropriate for the URL (e.g., HTTP, FTP)

  • Negotiate with the server for the requested document, providing authentication when necessary

  • Interpret the retrieved document’s headers

  • Parse and display the document’s HTML content

The LWP library provides support for all of the tasks listed above, and several others, including handling proxy servers. In its simplest form, you can use LWP to fetch remote URLs from within a Perl script. With more effort, you can write an entirely Perl-based web browser. In fact, the Perl/Tk library comes complete with a crude but functional graphical browser based on LWP.

The LWP modules are divided into the following categories: ...

Get Web, Graphics & Perl/Tk Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.