Chapter 7. Using Mason with mod_perl
While Mason can be used in any text generation context, it is most frequently used to create dynamic web sites. As you probably know, executing Perl (or anything else for that matter) as a CGI can be very slow. Mason, because it is not a small amount of code, can be sluggish when run as a CGI under heavy loads.
To that end, Mason has been designed to play nice when run under
mod_perl
. In fact, Mason has quite a number of
features that make it nicely suited to running under
mod_perl
.
This chapter assumes that you are familiar with Apache, particularly
Apache’s configuration files, and with
mod_perl
. If you’re not,
here’s a teaser: mod_perl
embeds
a Perl interpreter inside the Apache web server. Because Perl is
already loaded, no external processes need to be launched to serve
Perl-generated content. mod_perl
also allows many
server tasks to be configured and executed using Perl, which can be a
great convenience.
More information on Apache can be found via the Apache web site at http://httpd.apache.org/, as well as in O’Reilly’s Apache: The Definitive Guide, 3rd Edition (Ben and Peter Laurie, 2003).
For more information on mod_perl
,
the
mod_perl
site at
http://perl.apache.org/ is
useful, as is Stas Bekman’s fabulous
mod_perl
guide, which can be found at the same
location. Also useful is Writing Apache Modules with Perl
and C
(the “Eagle
Book”) by Lincoln Stein and Doug MacEachern, also
published by O’Reilly.[16] Despite the title, it is really primarily
about mod_perl
.
A recent book from Sams
Publishing,
The
mod_perl Developer’s
Cookbook by Geoffrey Young, Paul Lindner, and Randy Kobes,
is also an extremely valuable resource for anyone
who’s going to spend a significant amount of time
working with mod_perl
. It fills a different niche
in the developer’s mental toolkit.
With Apache 2.0 and mod_perl
2.0 on the horizon
as this is being written, please note that this chapter assumes that
you are using Apache 1.3.x and mod_perl
1.22 or
greater. In addition, your mod_perl
should have
been compiled with PERL_METHOD_HANDLERS=1
and
PERL_TABLE_API=1
, or with
EVERYTHING=1
.
We expect Mason to work immediately under the 1.x compatibility layer
that mod_perl
2.0 will provide. And of course,
once mod_perl
and Apache 2.0 are out, we hope to
find new features for Mason to exploit.
Configuring Mason
Mason can be configured under
mod_perl
in two different ways. The easiest of the
two merely requires that you add a few directives to
Apache’s configuration files. This method is very
easy to use and is appropriate for most uses of Mason.
It’s commonly called “configuration
via httpd.conf,” though many
configuration directives can be placed anywhere Apache will see them,
such as in an .htaccess file.
The other way is to write a custom piece of Perl code to bind Mason
and mod_perl
together, which you instruct
mod_perl
to use when handling requests. This
method is very flexible but is a bit more complicated. It is not
usually necessary, but it can be useful for a particularly complex or
dynamic configuration. This configuration method is commonly called
“configuration via a
handler.pl,” though the
handler.pl file can be called anything you like.
For simplicity’s sake, we always refer to the httpd.conf and handler.pl files throughout the book.
Configuration via httpd.conf
To make
Mason work under
mod_perl
, we need to set up a few Mason
configuration variables and then tell mod_perl
to
use Mason as a
PerlContentHandler
. Here is the simplest possible
configuration:
SetHandler perl-script PerlHandler HTML::Mason::ApacheHandler
The SetHandler
directive just tells Apache to use
mod_perl
for this request. The
PerlHandler
directive is provided by
mod_perl
, and it tells Apache that the given
module is a content handler. This means that the module will respond
to the request and generate content to be sent to the client.
Putting the previous snippet in your configuration file will cause every file your web server processes to be handled by Mason. This is probably not what you want most of the time, so let’s narrow it down a bit:
<Location /mason> PerlSetVar MasonCompRoot /path/to/doc/root/mason SetHandler perl-script PerlHandler HTML::Mason::ApacheHandler </Location>
This tells Apache that only requests that have a path starting with /mason will be handled by Mason. We’ve narrowed down the component root correspondingly, though this is not required. In fact, it’s important to realize that component root and document root are not the same thing. There will be more on this later.
Alternately, we might want to specify that only certain file extensions will be handled by Mason:
AddType text/html .mhtml <FilesMatch "\.mhtml$"> SetHandler perl-script PerlHandler HTML::Mason::ApacheHandler </FilesMatch>
The first directive tells Apache that files ending with
.mhtml have a content-type of
text/html
. The
FilesMatch
section
says that files ending with .mhtml will be
handled by Mason. This configuration is convenient if you want to
intermix Mason components with other types of content, such as static
HTML or image files, in the same directory. You want Mason to process
only the Mason components, as having it process images or CSS is both
a waste of time and a possible source of errors. Who knows what Mason
will make of an image’s binary data? You probably
don’t want to find out.
By default Mason will use the server’s document root
for the resolver’s comp_root
parameter. Mason also needs a data directory to store things like
compiled components and cache files. By default, this will be a
subdirectory called mason under your
server’s ServerRoot
. It is
important that this directory be writable by the user or group ID
that the Apache children run as, though the ApacheHandler will ensure
that this happens if your server is started as the root user.
Both of these defaults can easily be overridden.
PerlSetVar MasonCompRoot /var/www/comps PerlSetVar MasonDataDir /var/mason-data-dir
The
PerlSetVar
directive sets variables that are
accessible by Perl modules via the Apache API. Mason uses this API
internally to get at these settings.
All of the Interp, Compiler, and Lexer parameters that were discussed
in Chapter 6 can be set from the configuration
file. A full listing of all the variables that can be set via
PerlSetVar
directives can be found in Appendix B.
You also may have multiple Mason configurations for different parts of your web server:
<VirtualHost 1.2.3.4> ServerName www.example.com DocumentRoot /home/example/htdocs/ PerlSetVar MasonCompRoot /home/example/htdocs PerlSetVar MasonDataDir /home/example/mason-data <FilesMatch "\.mhtml$"> SetHandler perl-script PerlHandler HTML::Mason::ApacheHandler </FilesMatch> </VirtualHost> <VirtualHost 1.2.3.4 > ServerName hello-kitty-heaven.example.com DocumentRoot /home/hello-kitty/htdocs/ PerlSetVar MasonCompRoot /home/hello-kitty/htdocs/mason PerlSetVar MasonDataDir /home/hello-kitty/mason-data <FilesMatch "\.mhtml$"> SetHandler perl-script PerlHandler HTML::Mason::ApacheHandler </FilesMatch> </VirtualHost>
In this case, Mason will find the relevant configuration directives when asked to handle a request.
When you have only a single Mason configuration for your server, Mason will attempt to create the objects it needs as early as possible, during the initial server startup.
Doing this increases the amount of shared memory between Apache processes on most systems. The reason is that memory that is not modified after a process forks can be shared between a parent and any children it spawns, at least with some operating systems.
Configuration via Custom Code
When simple configuration variables aren’t enough, when you simply must do it the hard way, Mason has an alternative. Write your own code. This method gives you complete control over how Mason handles requests at the cost of a bit of extra code to maintain.
The simplest external script that would work might look something like this:
package MyMason::MyApp; use strict; use HTML::Mason::ApacheHandler; use Apache::Request; my $ah = HTML::Mason::ApacheHandler->new ( comp_root => '/home/httpd/html', data_dir => '/home/httpd/mason' ); sub handler { my $r = shift; # Apache request object; return $ah->handle_request($r); }
Assume that this file is saved in the Apache configuration directory as handler.pl.
Then you’d add a few configuration directives to your Apache configuration file:
PerlRequire handler.pl <FilesMatch "\.mhtml$"> SetHandler perl-script PerlHandler MyMason::MyApp </FilesMatch>
Notice the lack of PerlSetVar
directives this
time. Also note that the value given to the
PerlHandler
directive is now the package you
declared in the handler.pl file. This
combination of script and Apache configuration would give us the
exact same results as in the previous section.
Let’s go through this in more detail to understand
exactly what it is doing. Starting with the Apache configuration
piece, we see that we set PerlHandler
to
MyMason::MyApp
. This tells
mod_perl
to look for a subroutine called
handler()
in the
MyMason::MyApp
namespace. Mason does
not include any such thing, so we have to write it ourselves, which
is what the script does.
The choice of naming it
MyMason::MyApp
is completely
arbitrary. You might prefer something that identifies the project
you’re working on, like
GooberCorp::WebEmail::Mason
or
something like that. It doesn’t even need to have
the word Mason
in it, though it will
probably improve the clarity of your httpd.conf
file if it does.
Why are we declaring ourselves as being in the
MyMason::MyApp
namespace? Look at our
PerlHandler
directive. It indicates that the
handler subroutine will be found in that same namespace.
The first few lines of the script are simple. The only module that
must be loaded is
HTML::Mason::ApacheHandler
.
To save some memory, we load Apache::Request
in
this file. Mason would load this for us when it was needed, but we
want to make sure it gets loaded during the server startup so memory
can be shared.
Then we create the
HTML::Mason::ApacheHandler
object.
This object takes an Apache request object and figures out how to
dispatch it to Mason.
This object contains an
HTML::Mason::Interp
object. As we discussed in the previous
chapter, when a Mason object contains another Mason object, you can
pass parameters to the containing object’s
constructor that are intended for the contained object(s).
This means that parameters that are intended for the Interpreter
object’s constructor can be passed to the
ApacheHandler’s new()
method. In
addition, since the Interpreter contains a Resolver, Compiler, and so
forth, you can also pass parameters for those objects to the
ApacheHandler constructor.
The handler()
subroutine itself is quite simple.
The Apache request object is always passed to
any handler subroutine by mod_perl
. This object is
then passed to the ApacheHandler object’s
handle_request()
method. The
handle_request()
method does all the real work and
makes sure that content is sent to the client. Its return value is a
status code for the request and the handler()
subroutine simply returns this status code to
mod_perl
, which passes it onto Apache, which
handles it however it is configured to do so.
If this were all we did with a handler subroutine it would be awfully pointless. Let’s examine a more complicated scenario.
We can rewrite the earlier virtual hosting example to use an external script:
PerlRequire handler.pl <VirtualHost1.2.3.4
> ServerNamewww.example.com
<FilesMatch "\.mhtml$"> SetHandler perl-script PerlHandler MyMason::MyApp </FilesMatch> </VirtualHost> <VirtualHost1.2.3.4
> ServerNamehello-kitty-heaven.example.com
<FilesMatch "\.mhtml$"> SetHandler perl-script PerlHandler MyMason::MyApp </FilesMatch> </VirtualHost>
That takes care of the Apache configuration file; now the script:
package MyMason::MyApp; use strict; use HTML::Mason::ApacheHandler; use Apache::Request; my %host_to_comp_root = ('www.example.com'
=>'/home/example/htdocs'
,'hello-kitty-heaven.example.com'
=>'/home/hello-kitty/htdocs'
); my %ah; sub handler { my $r = shift; # Apache request object; my $host = $r->hostname; # tells us what server was requested; my $comp_root = $host_to_comp_root{$host}; # create a new object for this host if none exists yet. $ah{$host} ||= HTML::Mason::ApacheHandler->new( comp_root => $comp_root ); return $ah{$host}->handle_request($r); }
This is a rather simple example and doesn’t necessarily justify writing a script rather than just configuring via the Apache configuration file. However, let’s imagine that we also had the script check in each home directory for extra Mason configuration directives, which could be stored either as pure Perl or in a specified format.
How about if you had to do virtual hosting for 200 domain names? Then some sort of scripted solution becomes more appealing. Of course, you could always write a script to generate the Apache configuration directives too. It really depends on what your needs are. But Mason gives you the flexibility to handle it in the way you think best.
Document Root Versus the Component Root
Apache’s
document root is what defines the top level web directory of your
Apache configuration. For example purposes, let’s
assume a document root of /home/httpd/htdocs. If
you request the document /index.html via your
web browser, Apache will look for the file
/home/httpd/htdocs/index.html. If
index.html contains an
HREF
to
/some/file.html, you would have to place a file
at /home/httpd/htdocs/some/file.html for the
link to be resolved properly.
Mason has a component root, which is somewhat similar. If Mason’s component root is /home/httpd/htdocs/mason, and a component makes a component call with an absolute path of /some/component, Mason will look for a file at /home/httpd/htdocs/mason/some/component.
It can be confusing when the component root and the document root are
not the same because this means that the path for an
HREF
and a component path, though
they may appear to be the same, can point to two different files.
For example, with the preceding configuration, we have the following:
<a href="/some/file.html">resolves to /home/httpd/htdocs/some/file.html</a>. <& /some/file.html &> resolves to /home/httpd/htdocs/mason/some/file.html.
Do you see the difference?
Be sure to keep this in mind while working on your components. To avoid dealing with this problem, you could simply make your document root and component root the same directory and decide whether or not something is a component based on its file extension.
This is generally a bit easier on the brain and is definitely what we recommend for first-time Mason users.
Not OK
By default, if a component does not give an explicit return code, the ApacheHandler object will assume that the request was error free and that the status it should return is OK. But sometimes things are just not OK.
For example, we may want to give an authorization error or a document not found error. There are several ways of doing this.
The first is to have the component that is called return the desired
status code. Inside the
handle_request()
method, the ApacheHandler object
checks to see if the component that it called returned a value. If
so, it uses this as the status code for the request.
If you try to do this, remember that with autohandler wrapping, the last component executed is not necessarily the first one called. For example, let’s assume a component called /give_up.html:
<%init> # I give up! use Apache::Constants qw(NOT_FOUND); return NOT_FOUND; </%init>
This component could be wrapped by an /autohandler like this:
<html> <head> <title>My wonderful site</title> </head> <body> % $m->call_next(%ARGS); </body> </html>
In this case the return code from the /give_up.html component ends up being ignored.
A better way to do this is to use the Mason request
object’s abort()
method, which we
covered in Chapter 4. Using the
abort()
method, we could rewrite
/give_up.html like this:
<%init> # I give up! use Apache::Constants qw(NOT_FOUND); $m->abort(NOT_FOUND); </%init>
Any value passed to abort()
will
eventually be passed to the client. But this still might not work.
The problem is the text content in the
/autohandler that is generated before
/give_up.html is called. Mason sees this before
abort()
is called and will try to send it to the
client. This may be a problem for some non-OK codes, particularly for
redirects. We need to clear Mason’s buffer in order
to make sure that the client doesn’t see any output
before the error is generated.
<%init> # I really give up! use Apache::Constants qw(NOT_FOUND); $m->clear_buffer; $m->abort(NOT_FOUND); </%init>
This will work just fine for all return codes, though some may need
additional manipulation of the Apache object, $r
,
depending on the status
code being returned.
$r
Every component that is run under Apache via the ApacheHandler module
has access to a global variable called
$r
. This variable is
the Apache request object for the current request. Using this
variable gives you access to the full Apache API, including the
ability to set HTTP headers, send messages to the Apache logs, access
Apache configuration information, and much more.
If you used the Apache::Request
module to
processing incoming arguments, which is Mason’s
default, then $r
will actually be an
Apache::Request
object.
Documenting what you can do with this object is outside the scope of
the book, but do not despair. The mod_perl
resources mentioned at the beginning of this chapter, as well as the
Apache object’s documentation (run perldoc Apache
, and if you set args_method
to
mod_perl
, also perldoc Apache::Request
), can tell you everything you need to know.
It’s worth looking at the documentation to get an
idea of what kinds of things it’s capable of doing.
ApacheHandler Parameters
The ApacheHandler object can take several parameters to its constructor; all of them are optional:
-
args_method
=> 'mod_perl'
or'CGI'
This tells the object what module you would like it to use for parsing incoming query string and POST parameters.
CGI
indicates that you want to useCGI.pm
andmod_perl
indicates that you want to useApache::Request
.Apache::Request
is faster, uses less memory, and is the default.You may choose to use
CGI.pm
if you want to take advantage of its form element generation features or if you cannot use Apache::Request on your operating system.- decline_dirs => $boolean
By default, requests that match directories under a
Location
orDirectory
section served by Mason are declined, returning a status code ofDECLINED
(-1)
so that Apache will handle directory requests as it normally does. If you would like to handle these requests with Mason, presumably via a dhandler, you should set this to false.Obviously, if you told Apache to serve Mason requests based only on a file extension, this parameter is not likely to be meaningful.
- apache_status_title => $string
The ApacheHandler object will register itself with
mod_perl
’sApache::Status
module if possible. This registration involves givingApache::Status
a unique title for the registered object. This defaults to “HTML::Mason status” but if you have multiple ApacheHandler objects you may want to give each one a unique title. Otherwise, only one will be visible under theApache::Status
display.
The ApacheHandler module provides a special subclass of the Request
object $m
. This object has an additional
constructor parameter besides those available to normal requests:
- auto_send_headers => $boolean
This tells Mason whether or not you’d like it to automatically send the HTTP headers before sending content to a client. By default, this is true, and Mason will call
$r->send_http_header()
before sending output to the client. If you turn this off, you will need to send the headers yourself.If you do call the
send_http_header()
method yourself before Mason has a chance to do so, Mason will not send extra headers, regardless of the value of this variable.
Remember, you can simply pass this value to the ApacheHandler
object when you create it, or you can set
MasonAutoSendHeaders
in your
httpd.conf file.
To Autoflush or Not to Autoflush
In Chapter 4 we saw that autoflushing can be turned on and off for a request. Whether or not autoflushing is turned on has a big impact on what kind of things you can do while running under Apache.
With autoflush off, you can easily start generating content, have your code throw it away halfway through, and then issue a redirect. This will simply not work with autoflushing on.
For a redirect to work, it has to have a chance to set the headers. Since content is sent as soon as it is created when autoflushing, any redirects that happen after content is generated will happen after the headers have already been sent. This makes it harder to have a flexible application with autoflushing on, and for this reason most people do not use it.
Turning autoflush on can make the response time appear quicker, since
the initial output gets to the client sooner. To get the best of both
worlds, leave autoflushing off and send quick status
reports with $m->flush_buffer
on the pages that
need it.
Generating Something Besides HTML
Eventually you may want to have Mason generate things besides HTML, such as plain text pages, MP3 playlists, or even images. This is quite easy to do. Here’s a simple component that generates plain text:
I am a piece of plain text. So boring. This will not be <b>bold</b>. <%init> $r->content_type('text/plain'); </%init>
If you want to generate binary data, you have to be careful to make sure that no extraneous snippets of text sneak into it:
<%args> $type => 'jpeg' </%args> <%init> use Apache::Constants qw(OK); $m->clear_buffer; # avoid extra output (but it only works when autoflush is off) my $img = make_image( type => $type ); # magic hand-waving ... $r->content_type("image/$type"); $r->send_http_header; $m->print($img); $m->abort(OK); # make sure nothing else gets sent </%init>
This component does two things to ensure that nothing corrupts the
image’s binary data. First, it clears the buffer,
because if this component was wrapped by an autohandler there could
be some text in the buffer when it is called. Of course, if
you’ve turned on autoflushing, the
clear_buffer()
method doesn’t actually
do anything, so you’d have to be extra careful in
that situation.
Then, after sending the image, the component flushes the buffer to
make sure that output gets sent and then aborts to make sure that
nothing gets sent afterward. By passing the OK
status code to the
abort()
method, we make sure that the correct
status code makes its way to the client. The
abort()
method does not prevent
output from being sent to the client, so the image is sent as
we’d expect.
We put all this code in an <%init>
block to
make sure that it gets executed right away, before any whitespace
from the rest of the component could be processed as output.
Note that Mason’s templating capabilities aren’t exactly taking center stage in this example. You may ask why Mason is being used in this situation at all. Indeed, without context, it’s difficult to see a good reason; however, people have done just this kind of thing in order to take advantage of Mason’s other features like dhandlers or to integrate the dynamically generated image into an existing Mason site.
Apache::Status and Mason
As was mentioned earlier, Mason can cooperate with the
Apache::Status
module to display information about itself. To enable this module is
relatively simple. For example, if you’d like the
module to be accessible at the URL /perl-status,
you could add this to your Apache configuration:
<Location /perl-status> SetHandler perl-script PerlHandler Apache::Status </Location>
Apache::Status
provides information about
mod_perl
in general and allows other modules to
provide their own status hooks. Mason provides a basic status report
on the ApacheHandler and Interp objects, as well as a list of which
components are currently in the code cache.
Get Embedding Perl in HTML with Mason now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.