Chapter 12. Downloading Web Pages Through a Proxy Server
The previous article presented five simple but elegant programs
that download information from various web services: stock quotes,
weather predictions, currency information, U.S. postal address
correction, and CNN headline news. If you’re like me, your company uses
a firewall to repel wily hackers, which means that we have to use a
proxy server to access most URLs. A proxy server (sometimes called a “gateway”) is simply an
intermediary computer that sends your request to a server and returns
its response to you. The bad news: if you try to use the LWP::Simple get
function without first
letting it know about your proxy server, it returns nothing at
all.
The good news: there’s a simple way around this. The LWP::Simple module checks an environment variable called
http_proxy
. If $ENV{http_proxy}
contains the name of a computer, your calls to get
use it as a proxy server. You can set environment variables in two ways:
either by assigning a value to $ENV{http_proxy}
, or
by using whatever mechanism your shell or operating system provides. For
instance, you can define your proxy server under the Unix bash shell as
follows:
% export http_proxy=http://proxy.mycompany.com:1080
This makes LWP::Simple route requests through port 1080 of the
proxy server proxy.mycompany.com
. You may need to use
the set
or setenv
command,
depending on your shell. There are also related environment variables
for non-http services: ftp_proxy, gopher_proxy ...
Get Web, Graphics & Perl/Tk Programming now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.