Chapter 9. Mason and CGI
Although mod_perl
is pretty cool,
it’s not the only way to use Mason to build a web
site. In fact, plenty of times it’s more advisable
to use CGI than mod_perl
, as we describe in this
chapter. If you find yourself in such a situation,
you’re in luck — Mason works just fine under
CGI, and special care has gone into making sure the cooperation is
smooth. The HTML::Mason::CGIHandler
module
provides the glue necessary to use Mason in most common CGI
environments.
CGI-Appropriate Situations
Before we get into the details of how to set up Mason under CGI,
let’s think about why you might
want to use this setup. After all, isn’t
mod_perl
supposed to be
better than CGI? Well, yes and no. As in most
things, context is everything. The following factors may conspire to
make you choose clunky old
CGI over
clunky new mod_perl
in a particular situation:
- Need instant gratification
Installing
mod_perl
can be somewhat difficult if you’ve never done it before (heck, even if you have done it before), and it can take a while to get used to the peculiarities of developing in amod_perl
environment. If you want to try Mason out but don’t want to spend time installing and configuringmod_perl
(or you don’t want to wait for the person who’s going to come install it for you), you may be interested in usingHTML::Mason::CGIHandler
to start development, then switching over tomod_perl
andHTML::Mason::ApacheHandler
once you’ve gotten comfortable withmod_perl
.- Must share hosting environments
Many organizations simply don’t have the money to pay for their own server and staff to administer it, so they sign up with a cheap virtual hosting service that lets them run CGI scripts. The key word “virtual” means that several organizations, inevitably of varying scruples, share the same web server on the same machine. Although some of these services say they offer
mod_perl
, you should not use it, because it is very insecure and very prone to catastrophic development errors.It is insecure because all your code will run in the web server process, along with any other hooligan’s code on your shared server. Unless you trust all those hooligans not to steal your passwords, harass your clients, delete your files, and plunder your village, you should avoid using
mod_perl
offered in a virtual hosting environment.It is prone to development errors for the same reason: your code runs in the web server process, so if your Mason code accidentally gets into an infinite loop or hangs the server process, you bring the server down with you. Hosting services tend to dislike that. If you had enough money, you’d handle this problem by running separate servers for development and production, but you clearly don’t have enough money for that, since you’re using cheap virtual hosting.
Good old CGI, unpleasant as it is in other ways, provides a solution. Apache’s ExecCGI mechanism (and its equivalent in other servers) can be configured to use a “setuid” execution mechanism to make sure that your CGI scripts run as the user that owns them — you. This means that you can make all your sensitive data files accessible only by you, that any files your scripts create are owned by you, and that if you make a big mistake, you don’t anger the other people who share your server.
Of course, this argument is moot if your web hosting service doesn’t support the ExecCGI model. Most good full-featured services do, and most crappy ones don’t. Make sure you do the proper research.
- Speed not critical
Alas, all the claims of the
mod_perl
crowd are true — CGI is slower thanmod_perl
, and it doesn’t provide nearly as much control over the server process. However, sometimes you don’t care. If request speed doesn’t mean too much on your site, and you don’t need to do anything fancy withmod_perl
’s various request phases and content management, then there are few, if any, reasons to usemod_perl
.mod_perl
itself isn’t (necessarily) all that complicated, but the environment you deploy it in can be.A strong factor in your decision should be rigorous benchmarking; if your site running under CGI can keep up with the amount of traffic you’ll need to handle, then
HTML::Mason::CGIHandler
holds promise for you. As always, do the proper research.- Special memory usage situations
One of the particular constraints of
mod_perl
is that it can use a lot of memory. This is mainly due to the persistent nature of the embedded Perl interpreter; memory you allocate during one request may not get freed until many more requests are served and the child process is terminated. Even if you explicitly free the memory when you’re done with it, using Perl’sundef( )
function, most operating systems won’t actually return the memory block to the general pool of free system memory; they’ll just mark it as reusable within that same process. Because of this,mod_perl
developers are often quite miserly with memory and will sometimes do convoluted things just to keep memory usage at a minimum.The persistence of memory creates a problem when you need to have a large chunk of data resident in memory for processing. One of the most common instances of this is HTTP file uploads: if the user uploads a large file, that file will often end up in memory, creating a real problem in a
mod_perl
environment. However, if the user is uploading a large file, he’ll typically have to wait around for the file to transfer over the network, which means that he won’t really care (or notice) if the receiving script takes an extra half-second to execute. CGI can be useful in this situation, because any memory used during the request will be freed up immediately when the request is over.- Web server isn’t Apache
Although Apache is a great and flexible web server with a huge support team and developer community, it’s not the only web server on the planet. If you find yourself needing to use a server other than Apache, of course you won’t be able to use
mod_perl
either. Since most web servers support a CGI mechanism of some sort, CGI may be the best way to use Mason in an environment like this.In fact, even when your web server is Apache, you may want to use a different execution model like FastCGI. Mason’s CGI support extends well into situations like these.
CGI-Inappropriate Situations
In some situations, CGI just won’t do.
Depending on who you ask, these situations might be characterized
with terms ranging from “always” to
“never.” It’s
beyond the scope of this book to make all the arguments germane to
the CGI versus mod_perl
debate, but these factors
might make choosing CGI impossible:
- Startup cost too great
The most commonly encountered argument in favor of
mod_perl
is that it reduces the startup cost of each request by putting a Perl interpreter in resident memory, allowing various resources to be allocated once per server child rather than once per request. This is true, and important.This resource allocation scheme can produce tremendous speedups in several areas, most notably database connection time. Many modern dynamic sites rely on a database connection, and if you’re using an industrial-strength database like Oracle that has to perform lots of tasks every time you connect, connections can take so long to obtain that connecting on every request is simply unacceptable. Other resources may suffer from this same constraint, so try to determine your needs before running full speed into the CGI camp.
- Advanced mod_perl features too tantalizing
Let’s face it,
mod_perl
is cool. It’s a window into the most advanced web server in the world, using the most fun and versatile language in the world. If you simply can’t live without some of the more advancedmod_perl
features like content negotiation, server-side subrequests, and multiple request phase hooks, you’re forever going to feel fettered by CGI’s inherent limitations.
Creating a CGI-Based Site in Mason
You can get Mason and CGI to work together in several different ways. One model is to write traditional CGI scripts that use Mason as a templating language, executing Mason components from inside the CGI program. See Section 9.4 for how to set this up.
A better approach to building a Mason site under CGI is to let the
components drive the site. You can configure your web server to
invoke a CGI script of your choosing for certain requests, and that
script can begin Mason processing on those files. In other words, you
can have the same set of Mason components in your site you would have
under mod_perl
, but those components get executed
under the CGI paradigm.
Your comrade in this endeavor is the
HTML::Mason::CGIHandler
module. Its role is similar to the
HTML::Mason::ApacheHandler
module, but since CGI
is a bit clunkier than mod_perl
and the CGIHandler
is a bit younger than ApacheHandler, a bit more configuration is
necessary. You’ll need to combine four ingredients:
directives in the server’s configuration files
(httpd.conf or .htaccess
under Apache), a Mason wrapper CGI script, the Mason components
themselves, and the HTML::Mason::CGIHandler
module.
The necessary configuration directives are fairly straightforward. Here’s an example for Apache:
Action html-mason /cgi-bin/mason_handler.cgi <FilesMatch "\.html$"> SetHandler html-mason </FilesMatch>
Here, the mason_handler.cgi
script can be located wherever you want,
provided it’s set up by the server to be run as a
CGI script. The
/cgi-bin directory is already configured on most
systems using the ScriptAlias
directive, so
that’s a reasonable place to put the handler script,
though it’s certainly not the only place.
Instead of passing all .html files through Mason
as in the previous example, you might configure the server to
Masonize all files in a certain directory (use a
<Directory>
block for this or an
.htaccess file in that directory), only certain
specific files (use a <Files>
block or a
different <FilesMatch>
pattern to select those files), or some
more complicated scheme. See your server’s
documentation for more configuration help. Remember, each CGI request
will take a highly nonzero time to execute, so don’t
process a file with Mason unless it’s actually a
Mason component. In particular, make sure you don’t
accidentally pass image files to Mason, because each web page
typically contains many images, and the extra processing time for
those images will be a big waste if you invoke Mason unnecessarily,
not to mention that Mason may mangle those images when processing
them.
Next, you need to create your
mason_handler.cgi
script. It should be located wherever
the Action
directive indicates in the server
configuration. Here’s a
mason_handler.cgi that will serve nicely for
most sites. It’s fairly simple, since most of the
real work is done inside the
HTML::Mason::CGIHandler
module.
#!/usr/bin/perl -w use strict; use HTML::Mason::CGIHandler; my $h = HTML::Mason::CGIHandler->new ( data_dir => "$ENV{DOCUMENT_ROOT}/../mason-data", allow_globals => [qw(%session $user)], ); $h->handle_request;
The data_dir
and allow_globals
parameters should look familiar; they’re just passed
along to the Interpreter and Compiler, respectively. Note that the
data_dir
we use here may need to be changed for
your setup. The main consideration is that your
data_dir
is somewhere outside the document root,
so feel free to put it wherever makes sense for you.
Note that we didn’t pass a
comp_root
parameter. If no
comp_root
is specified,
HTML::Mason::CGIHandler
will use
$ENV{DOCUMENT_ROOT}
as the document root.
With the server configuration and handler script in place,
you’re ready to use
Mason. You can create a hierarchy of
components for your site just as you would under a
mod_perl
setup.
Using Mason Templates Inside Regular CGI Scripts
We have argued several times against the traditional CGI model, in which the response to each web request is driven primarily by a Perl script (or other executable program[20]) that focuses on making all the logical decisions necessary for fulfilling that request. We tend to prefer template-based solutions driven by the content of the request, using concise sprinklings of programming to control the dynamic elements of the request. In other words, we prefer Mason components to CGI scripts.
However, the world is a strange place. For some odd reason, managers may not always be persuaded by the well-reasoned arguments their programmers make in favor of using Mason in its traditional way. They may even want to take an existing functional site based on badly written CGI scripts and use some basic Mason-based templating techniques to achieve the timeless goal of separating logic from presentation. In these situations, you may be called upon to use Mason as if it were one of the lightweight solutions mentioned in Chapter 1.
Luckily, you won’t be the first person to want such
a thing. This path has been tread often enough that
it’s fairly easy to use Mason as a standalone
templating language. To do this, you create a Mason Interpreter, then
call the Interpreter’s exec( )
method, passing it either a component
path or component object as the first argument.
The CGI script in Example 9-1 is sort of the “Hello, World” of dynamic web programming. It lets the user enter text in an HTML form, submit the form, and see the resultant text in the server’s response.
#!/usr/bin/perl -w use strict; use CGI; use HTML::Mason; # Create a new query object, and print the standard header my $q = CGI->new; print $q->header; # Create a Mason Interpreter my $interp = HTML::Mason::Interp->new( ); # Generate a Component object from the given text my $component = $interp->make_component(comp_source => <<'EOF'); <%args> $user_input => '(no input)' </%args> <HTML> <HEAD><TITLE>You said '<% $user_input |h %>'</TITLE></HEAD> <BODY> You said '<% $user_input |h %>'. Type some text below and submit the form.<BR> <FORM ACTION="" METHOD="GET"> <INPUT NAME="user_input" value=""><br> <INPUT TYPE="submit" VALUE="Submit"> </FORM> </BODY> </HTML> EOF my %vars = $q->Vars; $vars{user_input} =~ s/^\s+|\s+$//g; # Sanitize # Execute the component, with output going to STDOUT $interp->exec($component, %vars);
Notice a couple of things about the code. First, the Mason component
is located in the middle of the code, surrounded by some fairly
generic Perl code to fetch the query parameters and pass them to the
component. Second, the Mason Interpreter is the main point of entry
for most of the tasks performed. First we create an Interpreter, then
we use the Interpreter’s make_component( )
method to create a new Component object
(see Chapter 5 for more on the
make_component( )
method), then we call the
Interpreter’s exec( )
method to
set the Mason wheels in motion.
Also, notice that the example code calls the CGI method
Vars( )
to get at the query parameters. This is
relatively convenient but doesn’t properly handle
multiple key/value pairs with the same key. To do this better,
we’d either have to use the CGI param( )
method and parse out the multiple keys
ourselves or split the Vars( )
values on ASCII
\0
(thus disallowing \0
in our
data). You’re probably not jumping for joy at the
prospect of dealing with these kinds of minutiae, but this is the
kind of thing you’ll find yourself dealing with in
CGI environments.
If you don’t actually need to examine or alter the
query parameters yourself before invoking the Mason template, you can
take advantage of the HTML::Mason::CGIHandler
handle_comp( )
method, which will create a CGI object
and parse out the query parameters, then invoke the component you
pass it. Example 9-2 shows the previous example
rewritten using the handle_comp( )
method.
#!/usr/bin/perl -w use strict; use HTML::Mason::CGIHandler; # Create a new CGIHandler object my $h = HTML::Mason::CGIHandler->new( ); # Generate a Component object from the given text my $component = $h->interp->make_component(comp_source => <<'EOF'); <%args> $user_input => '(no input)' </%args> <HTML> <HEAD><TITLE>You said '<% $user_input %>'</TITLE></HEAD> <BODY> You said '<% $user_input %>'. Type some text below and submit the form.<BR> <FORM ACTION="" METHOD="GET"> <INPUT NAME="user_input" value=""><br> <INPUT TYPE="submit" VALUE="Submit"> </FORM> </BODY> </HTML> EOF # Invoke the component, with output going to STDOUT $h->handle_comp($component);
As you can see, this hides all the CGI argument processing, ensuring
that you don’t make a silly mistake (or get lazy) in
handling the query parameters. It also handles sending the HTTP
headers. This approach is usually preferable to the one shown in
Example 9-1. Of course, if you’re
letting Mason handle all the details of the request, you have to
wonder why you don’t just use the
Action
directive with a generic CGI wrapper, as
covered in
Section 9.3.
Design Considerations
If you start building a site in this way, with each CGI script invoking Mason as a templating engine, you’re going to face some design decisions. For instance, if your code needs to do some argument processing or other decision making that alters the output, should those decisions happen inside or outside the Mason template? If you do a bunch of important stuff outside the template that alters the behavior inside the template, you can create lots of nonobvious logical dependencies that can be a nightmare to maintain. It’s somewhat better to put this stuff inside the template, but you run the risk of obscuring the template’s real purpose, which is to generate HTML output.
To really make the right kinds of decisions, direct yourself to Chapter 10, in which we try to convince you to use Mason for what Mason is good for and Perl modules for what Perl modules are good for. These design issues don’t have much to do with the CGI approach per se, but as you can see from our example script, the flow is already a little convoluted even in the simplest of cases. Anything you can do to keep things tidy may save you a lot of pain later.
Differences Between Mason Under CGI and mod_perl
The main functional difference between the environments provided by
HTML::Mason::CGIHandler
and
HTML::Mason::ApacheHandler
is that $r
, the Apache request object, is much
more limited in functionality under CGI. In fact, under CGI
it’s not a real Apache request object at all; it
just emulates a few of the more useful methods. It
can’t emulate some methods because they make sense
only in a mod_perl
environment. For example, you
won’t be able to access the Apache subrequest
mechanism through lookup_uri( )
or
lookup_file( )
, you won’t be able
to get at the client connection through the connection( )
method, and you can’t get configuration
parameters via dir_config( )
.
However, $r
does have methods to help you set
headers in the outgoing response, including
Location
and
Content-Type
headers.
This makes it relatively straightforward to send client-side
redirects and to use Mason to generate plain text, XML, image data,
or other formats besides the default HTML.
To set outgoing
headers, you can use
the $r->header_out( )
and $r->content_type( )
methods in
your components. They are very similar to their
mod_perl
counterparts of the same names. The
header_out( )
method takes two arguments, the name
of a header and the value it should be set to. If you pass only one
argument, the header’s value won’t
be set, but the method will return the current value of the header,
as set by a previous call to header_out( )
.
The content_type( )
method is the
“official” way to set the content
type of the outgoing response. It’s essentially just
an abbreviation for passing Content-Type
as the
first argument to the header_out( )
method. If you pass an argument to
content_type( )
, you’ll set the
outgoing content type. If you don’t set the content
type during the request, the CGI
module will set
the content type to text/html
.
Under normal circumstances, header_out( )
and
content_type( )
just pass along any headers you
set to the CGI
module’s
header( )
method. If you previously set a header
that you want to unset, you can pass undef
as the
new value to header_out( )
or
content_type( )
. Instead of setting the
header’s value to undef
(which
wouldn’t make a lot of sense in the HTTP context),
the header will be unset (i.e., removed from the table of headers to
send to the client).
Like its cousin ApacheHandler, CGIHandler adds an
$m->redirect( )
method to
the request object $m
, so you can redirect
browsers to a URL of your choosing in the same way you would under
mod_perl
.
Finally, if you want to access the CGI
query
object for the current request, you may do so by calling the
$m->cgi_object
method. In general
it’s best to avoid using the query object directly,
because doing so will lead to nonportable code and you most likely
won’t be taking advantage of
Mason’s argument-processing and content-generation
techniques. However, as with most things Perl, you can always get
enough rope, even if it means you might end up in a hopelessly
tangled mess, dangling by an ankle from the gallows pole of your own
code.
See the documentation for
HTML::Mason::CGIHandler
for more details.
Get Embedding Perl in HTML with Mason now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.