Proxy Server Configuration

The Apache HTTP server is extremely versatile. It has many features including a proxy server and a load balancer (as of version 2.1). See Recipe 10.9 in Apache Cookbook, Second Edition (O’Reilly). In this exercise, we will create our load balancer on a machine with a domain name of couch-proxy.example.com..

On couch-proxy.example.com, install Apache 2:

sudo aptitude install apache2

On couch-proxy.example.com, install mod_proxy:

sudo aptitude install libapache2-mod-proxy-html

On couch-proxy.example.com, enable mod_proxy:

sudo a2enmod proxy

On couch-proxy.example.com, enable mod_proxy_http:

sudo a2enmod proxy_http

On couch-proxy.example.com, enable mod_proxy_balancer:

sudo a2enmod proxy_balancer

We will also need mod_headers enabled:

sudo a2enmod headers

Finally, we will need mod_rewrite enabled:

sudo a2enmod rewrite

On couch-proxy.example.com, edit /etc/apache2/httpd.conf and add the following (it is likely that the file will be empty to start with):

Header append Vary Accept
Header add Set-Cookie "NODE=%{BALANCER_WORKER_ROUTE}e; path=/api" \
env=BALANCER_ROUTE_CHANGED

<Proxy balancer://couch-slave>
    BalancerMember http://couch-a.example.com:5984/api route=couch-a max=4
    BalancerMember http://couch-b.example.com:5984/api route=couch-b max=4
    BalancerMember http://couch-c.example.com:5984/api route=couch-c max=4
    ProxySet stickysession=NODE
    ProxySet timeout=5
</Proxy>

RewriteEngine On
RewriteCond %{REQUEST_METHOD} ^(POST|PUT|DELETE|MOVE|COPY)$
RewriteRule ^/api(.*)$ http://couch-master.example.com:5984/api$1 [P]
RewriteCond %{REQUEST_METHOD} ^(GET|HEAD|OPTIONS)$
RewriteRule ^/api(.*)$ balancer://couch-slave$1 [P]
ProxyPassReverse /api http://couch-master:5984/api
ProxyPassReverse /api balancer://couch-slave

Note

Apache allows for three possible load balancer scheduler algorithms. Traffic can be balanced based on number of requests (lbmethod=byrequests), the number of bytes transferred (lbmethod=bytraffic), or by the number of currently pending requests (lbmethod=bybusyness). The default is to balance by requests. To instead balance by busyness, add a ProxySet lbmethod=bybusyness directive to the end of the <Proxy> directive group (after ProxySet timeout=5 and before </Proxy>), although the order doesn’t matter.

You will also need to configure your virtual host to enable the rewrite engine and inherit the rewrite options from the server configuration above. Edit /etc/apache2/sites-enabled/000-default (or the configuration file for the appropriate virtual host) and add the following before the closing </VirtualHost> directive group:

    RewriteEngine On
    RewriteOptions inherit

Let’s take a look at each line of the /etc/apache2/httpd.conf configuration file. The Header append Vary Accept line appends the value Accept to the Vary HTTP header. If you have mod_deflate enabled then this module will add a Vary HTTP header with a value of Accept-Encoding. A Vary HTTP header informs a client as to what set of request-header fields it is permitted to base its caching on. Since mod_deflate may be adding this header, and CouchDB uses the Accept header to vary the media type (reflected in a Content-Type header with either a value of text/plain or application/json), it’s a good idea to make sure that clients know to also vary their caching based on the Accept HTTP header, and not just the Accept-Encoding HTTP header.

The line beginning with Header add Set-Cookie sets a cookie named NODE on the client. The value of this cookie will be the route name associated with the load balancer member that served the request. This allows for sticky sessions meaning that, once a client has been routed to a specific load balancer member, that client’s requests will continue to be routed to that same load balancer member node. This provides more consistency to the client. The path=/api part indicates to the client the URL path for which the cookie is valid. The env=BALANCER_ROUTE_CHANGED part indicates that the cookie should only be sent if the load balancer route has changed.

The <Proxy balancer://couch-slave> directive group defines a load balancer named couch-slave. A later configuration directive will define what requests should be sent to this load balancer. The three BalancerMember lines each add a member to the load balancer. The route= parameters (e.g., route=couch-a) give each route a name. The route name is used as the value of the NODE cookie. The max=4 parameters indicate the maximum number of connections that the proxy will allow to the backend server. The ProxySet stickysession=NODE directive indicates to the load balancer the cookie name to use (in this case NODE) when determining which route to use. The ProxySet timeout=5 directive instructs the proxy server to wait 5 seconds before timing out connections to the backend server.

Note

Keep the maximum number of connections to each CouchDB node low (e.g., max=4). This will prevent each node from getting overloaded. While 4 may seem like a very low number, CouchDB will respond to each request very quickly and allow for a high level of throughput. If the proxy server has enough memory and is configured to allow enough concurrent clients itself, then it can effectively queue requests for the backend servers.

If we didn’t need to proxy requests based on the HTTP method, we could have used the ProxyPass directive. However, for this added flexible we need to use mod_rewrite with the proxy ([P]) flag. The RewriteEngine On line enables the rewrite engine. The next line sets up a rewrite condition that says to only run the subsequent rewrite rule if the request HTTP method is POST, PUT, DELETE, MOVE, or COPY:

RewriteCond %{REQUEST_METHOD} ^(POST|PUT|DELETE|MOVE|COPY)$

The subsequent rewrite rule then proxies all requests to URIs starting with /api to the equivalent URI on http://couch-master.example.com:5984 (again, only if the previous rewrite condition has been met):

RewriteRule ^/api(.*)$ http://couch-master.example.com:5984/api$1 [P]

The next line contains another rewrite condition. This one says to only run the subsequent rewrite rule if the request HTTP method is GET, HEAD, or OPTIONS:

RewriteCond %{REQUEST_METHOD} ^(GET|HEAD|OPTIONS)$

The subsequent rewrite rule then proxies all requests to URIs starting with /api to the equivalent URI on the couch-master load balancer (again, only if the previous rewrite condition has been met):

RewriteRule ^/api(.*)$ balancer://couch-slave$1 [P]

The following ProxyPassReverse directives instructs Apache to adjust the URLs in the HTTP response headers to match that of the proxy server, instead of the reverse proxied server. This is mainly useful for the Location header that is sent when CouchDB creates a new document:

ProxyPassReverse /api http://couch-master:5984/api
ProxyPassReverse /api balancer://couch-slave

Open /etc/apache2/apache2.conf and look for the ServerLimit, ThreadsPerChild, and MaxClients directives. Apache limits the MaxClients to the ServerLimit multiplied by the ThreadsPerChild. These directives are intended to prevent your server from running out of memory and swapping, which would significantly decrease performance. Following is an example configuration with the MaxClients increased to 5,000 (this is from a machine with 1 GB of RAM):

ServerLimit         200
ThreadsPerChild      25
MaxClients         5000

On couch-proxy.example.com, restart Apache:

sudo /etc/init.d/apache2 restart

Get Scaling CouchDB now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.