The Apache HTTP server is extremely versatile. It has many features
including a proxy server and a load balancer (as of version 2.1). See
Recipe 10.9 in Apache Cookbook, Second
Edition (O’Reilly). In this exercise, we will create our load
balancer on a machine with a domain name of couch-proxy.example.com
..
On couch-proxy.example.com
,
install Apache 2:
sudo aptitude install apache2
On couch-proxy.example.com
,
install mod_proxy
:
sudo aptitude install libapache2-mod-proxy-html
On couch-proxy.example.com
,
enable mod_proxy
:
sudo a2enmod proxy
On couch-proxy.example.com
,
enable mod_proxy_http
:
sudo a2enmod proxy_http
On couch-proxy.example.com
,
enable mod_proxy_balancer
:
sudo a2enmod proxy_balancer
We will also need mod_headers
enabled:
sudo a2enmod headers
Finally, we will need mod_rewrite
enabled:
sudo a2enmod rewrite
On couch-proxy.example.com
, edit
/etc/apache2/httpd.conf and add the
following (it is likely that the file will be empty to start with):
Header append Vary Accept Header add Set-Cookie "NODE=%{BALANCER_WORKER_ROUTE}e; path=/api" \ env=BALANCER_ROUTE_CHANGED <Proxy balancer://couch-slave> BalancerMember http://couch-a.example.com:5984/api route=couch-a max=4 BalancerMember http://couch-b.example.com:5984/api route=couch-b max=4 BalancerMember http://couch-c.example.com:5984/api route=couch-c max=4 ProxySet stickysession=NODE ProxySet timeout=5 </Proxy> RewriteEngine On RewriteCond %{REQUEST_METHOD} ^(POST|PUT|DELETE|MOVE|COPY)$ RewriteRule ^/api(.*)$ http://couch-master.example.com:5984/api$1 [P] RewriteCond %{REQUEST_METHOD} ^(GET|HEAD|OPTIONS)$ RewriteRule ^/api(.*)$ balancer://couch-slave$1 [P] ProxyPassReverse /api http://couch-master:5984/api ProxyPassReverse /api balancer://couch-slave
Note
Apache allows for three possible load balancer scheduler
algorithms. Traffic can be balanced based
on number of requests (lbmethod=by
requests
), the number of bytes
transferred (lbmethod=bytraffic
), or
by the number of currently pending requests (lbmethod=bybusyness
). The default is to
balance by requests. To instead balance by busyness, add a ProxySet lbmethod=bybusyness
directive to the
end of the <Proxy>
directive
group (after ProxySet timeout=5
and
before </Proxy>
), although the
order doesn’t matter.
You will also need to configure your virtual host to enable the
rewrite engine and inherit the rewrite options from the server
configuration above. Edit /etc/apache2/sites-enabled/000-default (or the
configuration file for the appropriate virtual host) and add the following
before the closing </VirtualHost>
directive group:
RewriteEngine On RewriteOptions inherit
Let’s take a look at each line of the /etc/apache2/httpd.conf configuration file. The
Header append Vary Accept
line appends
the value Accept
to the Vary
HTTP header. If you have mod_deflate
enabled then this module will add a
Vary
HTTP header with a value of
Accept-Encoding
. A Vary
HTTP header informs a client as to what set
of request-header fields it is permitted to base its caching on. Since
mod_deflate
may be adding this header,
and CouchDB uses the Accept header to vary the media type (reflected in a
Content-Type
header with either a value
of text/plain
or application/json
), it’s a good idea to make sure
that clients know to also vary their caching based on the Accept
HTTP header, and not just the Accept-Encoding
HTTP header.
The line beginning with Header add
Set-Cookie
sets a cookie named NODE
on the client. The value of this cookie
will be the route name associated with the load balancer member that
served the request. This allows for sticky sessions meaning that, once a
client has been routed to a specific load balancer member, that client’s
requests will continue to be routed to that same load balancer member
node. This provides more consistency to the client. The path=/api
part indicates to the client the URL
path for which the cookie is valid. The env=BALANCER_ROUTE_CHANGED
part indicates that
the cookie should only be sent if the load balancer route has
changed.
The <Proxy
balancer://couch-slave>
directive group defines a load
balancer named couch-slave
. A later
configuration directive will define what requests should be sent to this
load balancer. The three BalancerMember
lines each add a member to the load balancer. The route=
parameters (e.g., route=couch-a
) give each route a name. The route
name is used as the value of the NODE
cookie. The max=4
parameters indicate
the maximum number of connections that the proxy will allow to the backend
server. The ProxySet stickysession=NODE
directive indicates to the load balancer the cookie name to use (in this
case NODE
) when determining which route
to use. The ProxySet timeout=5
directive instructs the proxy server to wait 5 seconds before timing out
connections to the backend server.
Note
Keep the maximum number of connections to each CouchDB node low
(e.g., max=4
). This will prevent each
node from getting overloaded. While 4 may seem like a very low number,
CouchDB will respond to each request very quickly and allow for a high
level of throughput. If the proxy server has enough memory and is
configured to allow enough concurrent clients itself, then it can
effectively queue requests for the backend servers.
If we didn’t need to proxy requests based on the HTTP method, we
could have used the ProxyPass
directive. However, for this added flexible we need to use mod_rewrite
with the proxy ([P]
) flag. The RewriteEngine On
line enables the rewrite
engine. The next line sets up a rewrite condition that says to only run
the subsequent rewrite rule if the request HTTP method is POST
, PUT
,
DELETE
, MOVE
, or COPY
:
RewriteCond %{REQUEST_METHOD} ^(POST|PUT|DELETE|MOVE|COPY)$
The subsequent rewrite rule then proxies all requests to URIs
starting with /api
to the equivalent
URI on http://couch-master.example.com:5984
(again,
only if the previous rewrite condition has been met):
RewriteRule ^/api(.*)$ http://couch-master.example.com:5984/api$1 [P]
The next line contains another rewrite condition. This one says to
only run the subsequent rewrite rule
if the request HTTP method is GET
,
HEAD
, or OPTIONS
:
RewriteCond %{REQUEST_METHOD} ^(GET|HEAD|OPTIONS)$
The subsequent rewrite rule then proxies all requests to URIs
starting with /api
to the equivalent
URI on the couch-master
load balancer
(again, only if the previous rewrite condition has been met):
RewriteRule ^/api(.*)$ balancer://couch-slave$1 [P]
The following ProxyPassReverse
directives instructs Apache to adjust the URLs in the HTTP response
headers to match that of the proxy server, instead of the reverse proxied
server. This is mainly useful for the Location
header that is sent when CouchDB
creates a new document:
ProxyPassReverse /api http://couch-master:5984/api ProxyPassReverse /api balancer://couch-slave
Open /etc/apache2/apache2.conf
and look for the ServerLimit
, ThreadsPerChild
, and MaxClients
directives. Apache limits the
MaxClients
to the ServerLimit
multiplied by the ThreadsPerChild
. These directives are intended
to prevent your server from running out of memory and swapping, which
would significantly decrease performance. Following is an example
configuration with the MaxClients
increased to 5,000 (this is from a machine with 1 GB of RAM):
ServerLimit 200 ThreadsPerChild 25 MaxClients 5000
On couch-proxy.example.com
,
restart Apache:
sudo /etc/init.d/apache2 restart
Get Scaling CouchDB now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.