Apache's mod_proxy module implements a proxy and cache for Apache. It implements proxying capabilities for the following protocols: FTP, CONNECT (for SSL), HTTP/0.9, HTTP/1.0, and HTTP/1.1. The module can be configured to connect to other proxy modules for these and other protocols.
mod_proxy is part of Apache, so there is no need to install a separate server—you just have to enable this module during the Apache build process or, if you have Apache compiled as a DSO, you can compile and add this module after you have completed the build of Apache.
A setup with a mod_proxy-enabled server and a mod_perl-enabled server is depicted in Figure 12-6.
We do not think the difference in speed between Apache's mod_proxy and Squid is relevant for most sites, since the real value of what they do is buffering for slow client connections. However, Squid runs as a single process and probably consumes fewer system resources.
The trade-off is that mod_rewrite is easy to use if you want to spread parts of the site across different backend servers, while mod_proxy knows how to fix up redirects containing the backend server's idea of the location. With Squid you can run a redirector process to proxy to more than one backend, but there is a problem in fixing redirects in a way that keeps the client's view of both server names and port numbers in all cases.
The difficult case is where you have DNS aliases that map to the same IP address, you want them redirected to port 80 (although the server is on a different port), and you want to keep the specific name the browser has already sent so that it does not change in the client's browser's location window.
The advantages of mod_proxy are:
No additional server is needed. We keep the plain one plus one mod_perl-enabled Apache server. All you need is to enable mod_proxy in the httpd_docs server and add a few lines to the httpd.conf file.
ProxyPass /perl/ http://localhost:81/perl/ ProxyPassReverse /perl/ http://localhost:81/perl/
ProxyPass directive triggers the proxying
process. A request for http://example.com/perl/
is proxied by issuing a request for
http://localhost:81/perl/ to the mod_perl
server. mod_proxy then sends the response to the client. The URL
rewriting is transparent to the client, except in one case: if the
mod_perl server issues a redirect, the URL to redirect to will be
specified in a
Location header in the response.
This is where
ProxyPassReverse kicks in: it scans
Location headers from the responses it gets from
proxied requests and rewrites the URL before forwarding the response
to the client.
It buffers mod_perl output like Squid does.
It does caching, although you have to produce correct
Expires HTTP headers for it to work. If some
of your dynamic content does not change frequently, you can
dramatically increase performance by caching it with mod_proxy.
ProxyPass happens before the authentication phase,
so you do not have to worry about authenticating twice.
Apache is able to accelerate secure HTTP requests completely, while also doing accelerated HTTP. With Squid you have to use an external redirection program for that.
In the following explanation, we will use www.example.com as the main server users access when they want to get some kind of service and backend.example.com as the machine that does the heavy work. The main and backend servers are different; they may or may not coexist on the same machine.
We'll use the mod_proxy module built into the main server to handle requests to www.example.com. For the sake of this discussion it doesn't matter what functionality is built into the backend.example.com server—obviously it'll be mod_perl for most of us, but this technique can be successfully applied to other web programming languages (PHP, Java, etc.).
configuration directive to map remote hosts into the URL space of the
local server; the local server does not act as a proxy in the
conventional sense, but appears to be a mirror of the remote server.
Let's explore what this rule does:
ProxyPass /perl/ http://backend.example.com/perl/
When a user initiates a request to http://www.example.com/perl/foo.pl, the request is picked up by mod_proxy. It issues a request for http://backend.example.com/perl/foo.pl and forwards the response to the client. This reverse proxy process is mostly transparent to the client, as long as the response data does not contain absolute URLs.
One such situation occurs when the backend server issues a redirect.
The URL to redirect to is provided in a
header in the response. The backend server will use its own
Port to build
the URL to redirect to. For example, mod_dir will redirect a request
for http://www.example.com/somedir/ to
http://backend.example.com/somedir/ by issuing a
redirect with the following header:
ProxyPass forwards the response unchanged to
the client, the user will see
http://backend.example.com/somedir/ in her
browser's location window, instead of
You have probably noticed many examples of this from real-life web sites you've visited. Free email service providers and other similar heavy online services display the login or the main page from their main server, and then when you log in you see something like x11.example.com, then w59.example.com, etc. These are the backend servers that do the actual work.
Obviously this is not an ideal solution, but since users don't usually care about what they see in the location window, you can sometimes get away with this approach. In the following section we show a better solution that solves this issue and provides even more useful functionalities.
directive lets Apache adjust the URL
Location header on HTTP redirect responses.
This is essential when Apache is used as a reverse proxy to avoid
bypassing the reverse proxy because of HTTP redirects on the backend
servers. It is generally used in conjunction with the
ProxyPass directive to build a complete frontend
ProxyPass /perl/ http://backend.example.com/perl/ ProxyPassReverse /perl/ http://backend.example.com/perl/
When a user initiates a request to
http://www.example.com/perl/foo, the request is
proxied to http://backend.example.com/perl/foo.
Let's say the backend server responds by issuing a
http://backend.example.com/perl/foo/ (adding a
trailing slash). The response will include a
ProxyPassReverse on the frontend server will
rewrite this header to:
This happens completely transparently. The end user is never aware of the URL rewrites happening behind the scenes.
Note that this
ProxyPassReverse directive can also
be used in conjunction with the proxy pass-through feature of
mod_rewrite, described later in this chapter.
use mod_proxy you need to make sure that your server will not become
a proxy for freeriders. Allowing clients to issue proxy requests is
controlled by the
ProxyRequests directive. Its
default setting is
Off, which means proxy requests
are handled only if generated internally (by
directives). Do not use the
directive on your reverse proxy servers.
Let's say that you have a frontend server running mod_ssl, mod_rewrite, and mod_proxy. You want to make sure that your user is using a secure connection for some specific actions, such as login information submission. You don't want to let the user log in unless the request was submitted through a secure port.
Since you have to proxypass the request between the frontend and backend servers, you cannot know where the connection originated. The HTTP headers cannot reliably provide this information.
A possible solution for this problem is to have the mod_perl server
listen on two different ports (e.g., 8000 and 8001) and have the
mod_rewrite proxy rule in the regular server redirect to port 8000
and the mod_rewrite proxy rule in the SSL virtual host redirect to
port 8001. Under the mod_perl server, use
$r->connection->port or the environment
PORT to tell if the connection is secure.
In addition to correcting the URI on its way back from the backend server, mod_proxy, like Squid, also provides buffering services that benefit mod_perl and similar heavy modules. The buffering feature allows mod_perl to pass the generated data to mod_proxy and move on to serve new requests, instead of waiting for a possibly slow client to receive all the data.
Figure 12-7 depicts this feature.
mod_perl streams the generated response into the kernel send buffer, which in turn goes into the kernel receive buffer of mod_proxy via the TCP/IP connection. mod_proxy then streams the file into the kernel send buffer, and the data goes to the client over the TCP/IP connection. There are four buffers between mod_perl and the client: two kernel send buffers, one receive buffer, and finally the mod_proxy user space buffer. Each of those buffers will take the data from the previous stage, as long as the buffer is not full. Now it's clear that in order to immediately release the mod_perl process, the generated response should fit into these four buffers.
If the data doesn't fit immediately into all buffers, mod_perl will wait until the first kernel buffer is emptied partially or completely (depending on the OS implementation) and then place more data into it. mod_perl will repeat this process until the last byte has been placed into the buffer.
The kernel's receive buffers (recvbuf) and send buffers (sendbuf) are used for different things: the receive buffers are for TCP data that hasn't been read by the application yet, and the send buffers are for application data that hasn't been sent over the network yet. The kernel buffers actually seem smaller than their declared size, because not everything goes to actual TCP/IP data. For example, if the size of the buffer is 64 KB, only about 55 KB or so can actually be used for data. Of course, the overhead varies from OS to OS.
It might not be a very good idea to increase the kernel's receive buffer too much, because you could just as easily increase mod_proxy's user space buffer size and get the same effect in terms of buffering capacity. Kernel memory is pinned (not swappable), so it's harder on the system to use a lot of it.
The user space buffer size for mod_proxy seems to be fixed at 8 KB,
but changing it is just a matter of replacing
HUGE_STRING_LEN with something else in
src/modules/proxy/proxy_http.c under the Apache
mod_proxy's receive buffer is configurable by the
ProxyReceiveBufferSize parameter. For example:
will create a buffer 16 KB in size.
ProxyReceiveBufferSize must be bigger than or
equal to 512 bytes. If it's not set or is set to
0, the system default will be used. The number
it's set to should be an integral multiple of 512.
ProxyReceiveBufferSize cannot be bigger than the
kernel receive buffer size; if you set the value of
ProxyReceiveBufferSize larger than this size, the
default value will be used (a warning will be printed in this case by
You can modify the source code to adjust the size of the
server's internal read-write buffers by changing the
Unfortunately, you cannot set the kernel buffers' sizes as large as you might want because there is a limit to the available physical memory and OSes have their own upper limits on the possible buffer size. To increase the physical memory limits, you have to add more RAM. You can change the OS limits as well, but these procedures are very specific to OSes. Here are some of the OSes and the procedures to increase their socket buffer sizes:
For 2.2 kernels, the maximum limit for receive buffer size is set in /proc/sys/net/core/rmem_max and the default value is in /proc/sys/net/core/rmem_default. If you want to increase the rcvbuf size above 65,535 bytes, the default maximum value, you have to first raise the absolute limit in /proc/sys/net/core/rmem_max. At runtime, execute this command to raise it to 128 KB:
panic# echo 131072 > /proc/sys/net/core/rmem_max
You probably want to put this command into /etc/rc.d/rc.local (or elsewhere, depending on the operating system and the distribution) or a similar script that is executed at server startup, so the change will take effect at system reboot.
For the 2.2.5 kernel, the maximum and default values are either 32 KB
or 64 KB. You can also change the default and maximum values during
kernel compilation; for that, you should alter the
definitions, respectively. (Since kernel source files tend to change,
use the grep(1) utility to find the files.)
The same applies for the write buffers. You need to adjust
/proc/sys/net/core/wmem_max and possibly the
default value in
/proc/sys/net/core/wmem_default. If you want to
adjust the kernel configuration, you have to adjust the
This buffering technique applies only to downstream
data (data coming from the origin server to the proxy),
not to upstream data. When the server gets an incoming stream,
because a request has been issued, the first bits of data hit the
mod_perl server immediately. Afterward, if the request includes a lot
of data (e.g., a big
POST request, usually a file
upload) and the client has a slow connection, the mod_perl process
will stay tied, waiting for all the data to come in (unless it
decides to abort the request for some reason). Falling back on
mod_cgi seems to be the best solution for specific scripts whose
major function is receiving large amounts of upstream data. Another
alternative is to use yet another mod_perl server, which will be
dedicated to file uploads only, and have it serve those specific URIs
through correct proxy
Because of some technical complications in TCP/IP, at the end of each client connection, it is not enough for Apache to close the socket and forget about it; instead, it needs to spend about one second lingering (waiting) on the client.
lingerd is a daemon (service) designed to take
over the job of properly closing network connections from an HTTP
server such as Apache and immediately freeing it to handle new
lingerd can do an effective job only if HTTP
KeepAlives are turned off. Since
Keep-Alives are useful for images, the recommended
setup is to serve dynamic content with mod_perl-enabled Apache and
lingerd, and static content with plain Apache.
lingerd setup, we don't
have the proxy (we don't want to use
lingerd on our httpd_docs
server, which is also our proxy), so the buffering chain we presented
earlier for the proxy setup is much shorter here (see Figure 12-8).
Hence, in this setup it becomes more important to have a big enough kernel send buffer.
lingerd, a big enough kernel send buffer, and
KeepAlives off, the job of spoonfeeding the data
to a slow client is done by the OS kernel in the background. As a
lingerd makes it possible to serve the
same load using considerably fewer Apache processes. This translates
into a reduced load on the server. It can be used as an alternative
to the proxy setups we have seen so far.
For more information about
Apache does caching as well. It's relevant to mod_perl only if you produce proper headers, so your scripts' output can be cached. See the Apache documentation for more details on the configuration of this capability.
To enable caching, use the
specifying the directory where cache files are to be saved:
Make sure that directory is writable by the user under which httpd is running.
CacheSize directive sets the desired space
usage in kilobytes:
CacheSize 50000 # 50 MB
Garbage collection, which enforces the cache size, is set in hours by
CacheGcInterval. If unspecified, the cache
size will grow until disk space runs out. This setting tells
mod_proxy to check that your cache doesn't exceed
the maximum size every hour:
CacheMaxExpire specifies the maximum number of
hours for which cached documents will be retained without checking
the origin server:
If the origin server for a document did not send an expiry date in
the form of an
Expires header, then the
CacheLastModifiedFactor will be used to estimate
one by multiplying the factor by the time the document was last
modified, as supplied in the
If the content was modified 10 hours ago, mod_proxy will assume an expiration time of 10 × 0.1 = 1 hour. You should set this according to how often your content is updated.
Expires is present, the
CacheDefaultExpire directive specifies the number
of hours until the document is expired from the cache:
To build mod_proxy into Apache, just add —enable-module=proxy during the Apache ./configure stage. Since you will probably need mod_rewrite's capability as well, enable it with —enable-module=rewrite.