Not every document can be fetched with a simple GET or POST request.
Many pages require authentication before you can access them, some use
cookies to keep track of the different users, and still others want
special values in the
User-Agent headers. This chapter shows
you how to set arbitrary headers, manage cookies, and even authenticate
using LWP. You’ll be able to make your LWP programs appear to be Netscape
or Internet Explorer, log in to a protected site, and work with sites that
For example, suppose you’re automating a web-based purchasing system. The server requires you to log in, then issues you a cookie to prove you’ve been authenticated. You must then send this cookie back to the server with every request you make.
Or, more mundanely, suppose you’re extracting information from one
of the many web sites that check the
User-Agent header in your requests. If your
User-Agent doesn’t identify yours as a
recent version of Netscape or Internet Explorer, the server sends you back
an “Upgrade your browser” page. You need to set the
User-Agent header to make it appear that you are
using Netscape or Internet Explorer.
HTTP was originally designed as a stateless protocol, meaning that each request is totally independent of other requests. But web site designers felt the need for something to help them identify the user of a particular session. The mechanism that does this is called a cookie. This ...