Chapter 12. Downloading Web Pages Through a Proxy Server

Rob Svirskas

The previous article presented five simple but elegant programs that download information from various web services: stock quotes, weather predictions, currency information, U.S. postal address correction, and CNN headline news. If you’re like me, your company uses a firewall to repel wily hackers, which means that we have to use a proxy server to access most URLs. A proxy server (sometimes called a “gateway”) is simply an intermediary computer that sends your request to a server and returns its response to you. The bad news: if you try to use the LWP::Simple get function without first letting it know about your proxy server, it returns nothing at all.

The good news: there’s a simple way around this. The LWP::Simple module checks an environment variable called http_proxy. If $ENV{http_proxy} contains the name of a computer, your calls to get use it as a proxy server. You can set environment variables in two ways: either by assigning a value to $ENV{http_proxy}, or by using whatever mechanism your shell or operating system provides. For instance, you can define your proxy server under the Unix bash shell as follows:

% export http_proxy=http://proxy.mycompany.com:1080

This makes LWP::Simple route requests through port 1080 of the proxy server proxy.mycompany.com. You may need to use the set or setenv command, depending on your shell. There are also related environment variables for non-http services: ftp_proxy, gopher_proxy ...

Get Web, Graphics & Perl/Tk Programming now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.