I started playing around with the Web a long time ago—at least, it feels that way. The first versions of Mosaic had just showed up, Gopher and Wais were still hot technology, and I discovered an HTTP server program called Plexus. What was different was it was implemented in Perl. That made it easy to extend. CGI was not invented yet, so all we had were servlets (although we didn’t call them that then). Over time, I moved from hacking on the server side to the client side but stayed with Perl as the programming language of choice. As a result, I got involved in LWP, the Perl web client library.
A lot has happened to the web since then. These days there is almost no end to the information at our fingertips: news, stock quotes, weather, government info, shopping, discussion groups, product info, reviews, games, and other entertainment. And the good news is that LWP can help automate them all.
This book tells you how you can write your own useful web client applications with LWP and its related HTML modules. Sean’s done a great job of showing how this powerful library can be used to make tools that automate various tasks on the Web. If you are like me, you probably have many examples of web forms that you find yourself filling out over and over again. Why not write a simple LWP-based tool that does it all for you? Or a tool that does research for you by collecting data from many web pages without you having to spend a single mouse click? After reading this book, you should be well prepared for tasks such as these.
This book’s focus is to teach you how to write scripts against services that are set up to serve traditional web browsers. This means services exposed through HTML. Even in a world where people eventually have discovered that the Web can provide real program-to-program interfaces (the current “web services” craze), it is likely that HTML scraping will continue to be a valuable way to extract information from the Web. I strongly believe that Perl and LWP is one of the best tools to get that job done. Reading Perl and LWP is a good way get you started.
It has been fun writing and maintaining the LWP codebase, and Sean’s written a fine book about using it. Enjoy!
Primary author and maintainer of LWP