The World Wide Web has been credited with bringing the Internet to the masses. The Internet was previously the stomping ground of academics and a small, elite group of computer professionals, mostly UNIX programmers and other oddball types, running obscure commands like ftp and finger, archie and telnet, and so on.
With the arrival of graphical browsers for the Web, the Internet suddenly exploded. Anyone could find things on the Web. You didn't need to be “in the know” anymore--you just needed to be properly networked. Equipped with Netscape Navigator or Internet Explorer or any other browser, everyone can now explore the Internet freely.
But graphical browsers can be limiting. The very interactivity that makes them the ideal interface for the Internet also makes them cumbersome when you want to automate a task. It's analogous to editing a document by hand when you'd like to write a script to do the work for you. Graphical browsers require you to navigate the Web manually. In an effort to diminish the amount of tedious pointing-and-clicking you do with your browser, this book shows you how to liberate yourself from the confines of your browser.
Web Client Programming with Perl is a behind-the-scenes look at how your web browser interacts with web servers. Readers of this book will learn how the Web works and how to write software that is more flexible, dynamic, and powerful than the typical web browser. The goal here is not to rewrite the browser, but to give you the ability to retrieve, manipulate, and redistribute web-based information in an automated fashion.
I like to think that this book is for everyone. But since that's a bit of an exaggeration, let's try to identify who might really enjoy this book.
This book is for software developers who want to expand into a new market niche. It provides proof-of-concept examples and a compilation of web-related technical data.
This book is for web administrators who maintain large amounts of data. Administrators can replace manual maintenance tasks with web robots to detect and correct problems with web sites. Robots perform tasks more accurately and quickly than human hands.
But to be honest, the audience that's closest to my heart is that of computer enthusiasts, tinkerers, and motivated students, who can use this book to satisfy their curiosity about how the Web works and how to make it work for them. My editor often talks about when she first learned UNIX scripting and how it opened a world of automation for her. When you learn how to write scripts, you realize that there's very little that you can't do within that universe. With this book, you can extend that confidence to the Web. If this book is successful, then for almost any web-related task you'll find yourself thinking, “Hey, I could write a script to do that!“
Unfortunately, we can't teach you everything. There are a few things that we assume that you are already familiar with:
Some of you already know why you picked up this book. But others may just have a nagging feeling that it's something useful to know, though you may not be entirely sure why. At the risk of seeming self-serving, let me suggest some ways in which this book may be helpful:
This book consists of seven chapters and three appendices, as follows:
Chapter 1, Introduction
Discusses basic terminology and potential uses for customized web clients.
Chapter 2, Demystifying the Browser
Translates common browser tasks into HTTP transactions. By the end of the chapter, the reader will understand how web clients and servers interact, and will be able to perform these interactions manually.
Chapter 3, Learning HTTP
Teaches the nuances of the HTTP protocol.
Chapter 4, The Socket Library
Introduces the socket library and shows some examples of how to write simple web clients with sockets.
Chapter 5, The LWP Library
Chapter 6, Example LWP Programs
A cookbook-type demonstration of several example applications.
Chapter 7, Graphical Examples with Perl/Tk
A demonstration of how you can use the Tk extention to Perl to add a graphical interface to your programs.
Appendix A, HTTP Headers
Contains a comprehensive listing of the headers specified by HTTP.
Appendix B, Reference Tables
Lists URLs that you can use to learn more about HTTP and LWP.
Appendix C, The Robot Exclusion Standard
Describes the Robot Exclusion Standard, which every good web programmer should know intimately.
In this book, we include many code examples. While the code is all contained within the text, many people will prefer to download examples rather than type them in by hand. You can find the complete set of source code used in this book on ftp.oreilly.com at /published/oreilly/nutshell/web-client.
To use FTP, you need a machine with direct access to the Internet. A sample session follows, with what you should type shown in boldface.
% ftp ftp.oreilly.com Connected to ftp.oreilly.com. 220 FTP server (Version 6.21 Tue Mar 10 22:09:55 EST 1992) ready. Name (ftp.oreilly.com:yourname): anonymous 331 Guest login ok, send domain style e-mail address as password. Password: yourname@yourhost (use your user name and host here) 230 Guest login ok, access restrictions apply. ftp> cd /published/oreilly/nutshell/web-client 250 CWD command successful. ftp> binary (Very important! You must specify binary transfer for compressed files.) 200 Type set to I. ftp> get examples.tar.gz 200 PORT command successful. 150 Opening BINARY mode data connection for examples.tar.gz. 226 Transfer complete. ftp> quit 221 Goodbye.
The file is a gzipped tar archive; extract the files from the archive by typing:
% gunzip examples.tar.gz % tar xvf examples.tar
System V systems require the following tar command instead:
% tar xof examples.tar
We use the following formatting conventions in this book:
As a reader of this book, you can help us to improve the next edition. If you find errors, inaccuracies, or typos anywhere in the book, please let us know about them. Also, if you find any misleading statements or confusing explanations, let us know. Send your bug reports and comments to:
O'Reilly & Associates, Inc. 101 Morris St. Sebastopol, CA 95472 1-800-998-9938 (in the US or Canada) 1-707-829-0515 (international/local) 1-707-829-0104 (FAX) firstname.lastname@example.org
Please let us know what we can do to make the book more helpful to you. We take your comments seriously, and will do whatever we can to make this book as useful as it can be.
The idea for this book started in early 1995 when I was a student at Purdue University. It all started when I attended a class entitled Proficient Use of WWW taught by George Vanecek, Jr. and Buster Dunsmore. It was a wonderful class that went all over the map, from HTML to HTTP to CGI to Perl programming. Other ideas for the book started when I worked at Purdue's Online Writing Lab as a web developer.
I'd like to extend a warm “thank you” to everyone who helped review the book, especially on short notice: Tom Christiansen, Larry Wall, Sean McDermott, Kirsten Klinghammer, Ed Hill, Andy Grignon, Jeff Sedayao, Michael Pelz-Sherman, and Norman Walsh. Special thanks for Kirsten and Sean for the 24-hour turnaround time, and to Tom, Larry, and Ed for being critical when someone needed to be critical.
Thanks also to Nancy Walsh for writing the Perl/Tk chapter. And thanks to all the people at O'Reilly & Associates: production editor Jane Ellin, cover designer Edie Freedman, Chris Reilley (who cleaned up the figures), Mike Sierra for Tools support, Mary Anne Weeks Mayo and Sheryl Avruch for quality control, and my editor Linda Mui.
Thanks to my parents, Chun and Liang, my sister Ginger, and my girlfriend Cynthia for their support.