|
|
|
|
Writing Apache Modules with Perl and CThe Apache API and mod_perlBy Lincoln Stein & Doug MacEachern1st Edition March 1999 1-56592-567-X, Order Number: 567X 746 pages, $34.95 |
Chapter 4.
Content Handlers
In this chapter:
Content Handlers as File Processors
Virtual Documents
Redirection
Processing Input
Apache::Registry
Handling Errors
Chaining Content Handlers
Method Handlers
This chapter is about writing content handlers for the Apache response phase, when the contents of the page are actually produced. In this chapter you'll learn how to produce dynamic pages from thin air, how to modify real documents on the fly to produce effects like server-side includes, and how Apache interacts with the MIME-typing system to select which handler to invoke.
Starting with this chapter we shift to using the Apache Perl API exclusively for code examples and function prototypes. The Perl API covers the majority of what C programmers need to use the C-language API. What's missing are various memory management functions that are essential to C programmers but irrelevant in Perl. If you are a C programmer, just have patience and the missing pieces will be filled in eventually. In the meantime, follow along with the Perl examples and enjoy yourself. Maybe you'll even become a convert.
Content Handlers as File Processors
Early web servers were designed as engines for transmitting physical files from the host machine to the browser. Even though Apache does much more, the file-oriented legacy still remains. Files can be sent to the browser unmodified or passed through content handlers to transform them in various ways before sending them on to the browser. Even though many of the documents that you produce with modules have no corresponding physical files, some parts of Apache still behave as if they did.
When Apache receives a request, the URI is passed through any URI translation handlers that may be installed (see Chapter 7, Other Request Phases, for information on how to roll your own), transforming it into a file path. The mod_alias translation handler (compiled in by default) will first process any Alias, ScriptAlias, Redirect, or other mod_alias directives. If none applies, the http_core default translator will simply prepend the DocumentRoot directory to the beginning of the URI.
Next, Apache attempts to divide the file path into two parts: a "filename" part which usually (but not always) corresponds to a physical file on the host's filesystem, and an "additional path information" part corresponding to additional stuff that follows the filename. Apache divides the path using a very simple-minded algorithm. It steps through the path components from left to right until it finds something that doesn't correspond to a directory on the host machine. The part of the path up to and including this component becomes the filename, and everything that's left over becomes the additional path information.
Consider a site with a document root of /home/www that has just received a request for URI /abc/def/ghi. The way Apache splits the file path into filename and path information parts depends on what directories it finds in the document root:
Physical Directory
Translated Filename
Additional Path Information
/home/www
/home/www/abc
/def/ghi
/home/www/abc
/home/www/abc/def
/ghi
/home/www/abc/def
/home/www/abc/def/ghi
empty
/home/www/abc/def/ghi
/home/www/abc/def/ghi
empty
Note that the presence of any actual files in the path is irrelevant to this process. The division between the filename and the path information depends only on what directories are present.
Once Apache has decided where the file is in the path, it determines what MIME type it might be. This is again one of the places where you can intervene to alter the process with a custom type handler. The default type handler (mod_mime) just compares the filename's extension to a table of MIME types. If there's a match, this becomes the MIME type. If no match is found, then the MIME type is undefined. Again, note that this mapping from filename to MIME type occurs even when there's no actual file there.
There are two special cases. If the last component of the filename happens to be a physical directory, then Apache internally assigns it a "magic" MIME type, defined by the
DIR_MAGIC_TYPEconstant as httpd/unix-directory. This is used by the directory module to generate automatic directory listings. The second special case occurs when you have the optional mod_mime_magic module installed and the file actually exists. In this case Apache will peek at the first few bytes of the file's contents to determine what type of file it might be. Chapter 7 shows you how to write your own MIME type checker handlers to implement more sophisticated MIME type determination schemes.After Apache has determined the name and type of the file referenced by the URI, it decides what to do about it. One way is to use information hard-wired into the module's static data structures. The module's
handler_rectable, which we describe in detail in Chapter 10, C API Reference Guide, Part I, declares the module's willingness to handle one or more magic MIME types and associates a content handler with each one. For example, the mod_cgi module associates MIME type application/x-httpd-cgi with its cgi_handler( ) handler subroutine. When Apache detects that a filename is of type application/x-httpd-cgi it invokes cgi_handler( ) and passes it information about the file. A module can also declare its desire to handle an ordinary MIME type, such as video/quicktime, or even a wildcard type, such as video/*. In this case, all requests for URIs with matching MIME types will be passed through the module's content handler unless some other module registers a more specific type.Newer modules use a more flexible method in which content handlers are associated with files at runtime using explicit names. When this method is used, the module declares one or more content handler names in its
handler_recarray instead of, or in addition to, MIME types. Some examples of content handler names you might have seen include cgi-script, server-info, server-parsed, imap-file, and perl-script. Handler names can be associated with files using either AddHandler or SetHandler directives. AddHandler associates a handler with a particular file extension. For example, a typical configuration file will contain this line to associate .shtml files with the server-side include handler:AddHandler server-parsed .shtmlNow, the server-parsed handler defined by mod_include will be called on to process all files ending in ".shtml" regardless of their MIME type.
SetHandler is used within <Directory>, <Location>, and <Files> sections to associate a particular handler with an entire section of the site's URI space. In the two examples that follow, the <Location> section attaches the server-parsed method to all files within the virtual directory /shtml, while the <Files> section attaches imap-file to all files that begin with the prefix "map-":
<Location /shtml>SetHandler server-parsed</Location><Files map-*>SetHandler imap-file</Files>Surprisingly, the AddHandler and SetHandler directives are not actually implemented in the Apache core. They are implemented by the standard mod_actions module, which is compiled into the server by default. In Chapter 7, we show how to reimplement mod_actions using the Perl API.
You'll probably want to use explicitly named content handlers in your modules rather than hardcoded MIME types. Explicit handler names make configuration files cleaner and easier to understand. Plus, you don't have to invent a new magic MIME type every time you add a handler.
Things are slightly different for mod_perl users because two directives are needed to assign a content handler to a directory or file. The reason for this is that the only real content handler defined by mod_perl is its internal perl-script handler. You use SetHandler to assign perl-script the responsibility for a directory or partial URI, and then use a PerlHandler directive to tell the perl-script handler which Perl module to execute. Directories supervised by Perl API content handlers will look something like this:
<Location /graph>SetHandler perl-scriptPerlHandler Apache::Graph</Location>Don't try to assign perl-script to a file extension using something like
AddHandlerperl-script.pl; this is generally useless because you'd need to set PerlHandler too. If you'd like to associate a Perl content handler with an extension, you should use the <Files> directive. Here's an example:<Files ~ "\.graph$">SetHandler perl-scriptPerlHandler Apache::Graph</Files>There is no UnSetHandler directive to undo the effects of SetHandler. However, should you ever need to restore a subdirectory's handler to the default, you can do it with the directive
SetHandler default-handler, as follows:<Location /graph/tutorial>SetHandler default-handler</Location>Adding a Canned Footer to Pages
To show you how content handlers work, we'll develop a module with the Perl API that adds a canned footer to all pages in a particular directory. You could use this, for example, to automatically add copyright information and a link back to the home page. Later on, we'll turn this module into a full-featured navigation bar.
Figure 4-1. The footer on this page was generated automatically by Apache::Footer.
![]()
Example 4-1 gives the code for Apache::Footer, and Figure 4-1 shows a screenshot of it in action. Since this is our first substantial module, we'll step through the code section by section.
package Apache::Footer;use strict;use Apache::Constants qw(:common);use Apache::File ();The code begins by declaring its package name and loading various Perl modules that it depends on. The use strict pragma activates Perl checks that prevent us from using global variables before declaring them, disallows the use of function calls without the parentheses, and prevents other unsafe practices. The Apache::Constants module defines constants for the various Apache and HTTP result codes; we bring in only those constants that belong to the frequently used :common set. Apache::File defines methods that are useful for manipulating files.
sub handler {my $r = shift;return DECLINED unless $r->content_type() eq 'text/html';The handler( ) subroutine does all the work of generating the content. It is roughly divided into three parts. In the first part, it fetches information about the requested file and decides whether it wants to handle it. In the second part, it creates the canned footer dynamically from information that it gleans about the file. In the third part, it rewrites the file to include the footer.
In the first part of the process, the handler retrieves the Apache request object and stores it in
$r. Next it calls the request's content_type( ) method to retrieve its MIME type. Unless the document is of type text/html, the handler stops here and returns aDECLINEDresult code to the server. This tells Apache to pass the document on to any other handlers that have declared their willingness to handle this type of document. In most cases, this means that the document or image will be passed through to the browser in the usual way.my $file = $r->filename;unless (-e $r->finfo) {$r->log_error("File does not exist: $file");return NOT_FOUND;}unless (-r _) {$r->log_error("File permissions deny access: $file");return FORBIDDEN;}At this point we go ahead and recover the file path, by calling the request object's filename( ) method. Just because Apache has assigned the document a MIME type doesn't mean that it actually exists or, if it exists, that its permissions allow it to be read by the current process. The next two blocks of code check for these cases. Using the Perl -e file test, we check whether the file exists. If not, we log an error to the server log using the request object's log_error( ) method and return a result code of NOT_FOUND. This will cause the server to return a page displaying the 404 "Not Found" error (exactly what's displayed is under the control of the ErrorDocument directive).
There are several ways to perform file status checks in the Perl API. The simplest way is to recover the file's pathname using the request object's filename( ) method, and pass the result to the Perl -e file test:
unless (-e $r->filename) {$r->log_error("File does not exist: $file");return NOT_FOUND;}A more efficient way, however, is to take advantage of the fact that during its path walking operation Apache already performed a system stat( ) call to collect filesystem information on the file. The resulting status structure is stored in the request object and can be retrieved with the object's finfo( ) method. So the more efficient idiom is to use the test
-e$r->finfo.Once finfo( ) is called, the stat( ) information is stored into the magic Perl filehandle
_and can be used for subsequent file testing and stat( ) operations, saving even more CPU time. Using the_filehandle, we next test that the file is readable by the current process and returnFORBIDDENif this isn't the case. This displays a 403 "Forbidden" error.my $modtime = localtime((stat _)[9]);After performing these tests, we get the file modification time by calling stat( ). We can use the
_filehandle here too, avoiding the overhead of repeating the stat( ) system call. The modification time is passed to the built-in Perl localtime( ) function to convert it into a human-readable string.my $fh;unless ($fh = Apache::File->new($file)) {$r->log_error("Couldn't open $file for reading: $!");return SERVER_ERROR;}At this point, we attempt to open the file for reading using Apache::File 's new( ) method. For the most part, Apache::File acts just like Perl's IO::File object-oriented I/O package, returning a filehandle on success or undef on failure. Since we've already handled the two failure modes that we know how to deal with, we return a result code of
SERVER_ERRORif the open is unsuccessful. This immediately aborts all processing of the document and causes Apache to display a 500 "Internal Server Error" message.my $footer = <<END;<hr>© 2001 <a href="http://www.ora.com/">O'Reilly & Associates</a><br><em>Last Modified: $modtime</em>ENDHaving successfully opened the file, we build the footer. The footer in this example script is entirely static, except for the document modification date that is computed on the fly.
$r->send_http_header;while (<$fh>) {s!(</BODY>)!$footer$1!oi;} continue {$r->print($_);}The last phase is to rewrite the document. First we tell Apache to send the HTTP header. There's no need to set the content type first because it already has the appropriate value. We then loop through the document looking for the closing </BODY> tag. When we find it, we use a substitution statement to insert the footer in front of it. The possibly modified line is now sent to the browser using the request object's print( ) method.
return OK;}1;At the end, we return an OK result code to Apache and end the handler subroutine definition. Like any other .pm file, the module itself must end by returning a true value (usually 1) to signal Perl that it compiled correctly.
If all this checking for the existence and readability of the file before processing seems a bit pedantic, don't worry. It's actually unnecessary for you to do this. Instead of explicitly checking the file, we could have simply returned
DECLINEDif the attempt to open the file failed. Apache would then pass the URI to the default file handler which will perform its own checks and display the appropriate error messages. Therefore we could have replaced the file tests with the single line:my $fh = Apache::File->new($file) || return DECLINED;Doing the tests inside the module this way makes the checks explicit and gives us a chance to intervene to rescue the situation. For example, we might choose to search for a text file of the same name and present it instead. The explicit tests also improve module performance slightly, since the system wastes a small amount of CPU time when it attempts to open a nonexistent file. If most of the files the module serves do exist, however, this penalty won't be significant.
There are several ways to install and use the Apache::Footer content handler. If all the files that needed footers were gathered in one place in the directory tree, you would probably want to attach Apache::Footer to that location:
<Location /footer>SetHandler perl-scriptPerlHandler Apache::Footer</Location>If the files were scattered about the document tree, it might be more convenient to map Apache::Footer to a unique filename extension, such as .footer. To achieve this, the following directives would suffice:
AddType text/html .footer<Files ~ "\.footer$">SetHandler perl-scriptPerlHandler Apache::Footer</Files>Note that it's important to associate MIME type text/html with the new extension; otherwise, Apache won't be able to determine its content type during the MIME type checking phase.
If your server is set up to allow per-directory access control files to include file information directives, you can place any of these handler directives inside a .htaccess file. This allows you to change handlers without restarting the server. For example, you could replace the <Location> section shown earlier with a .htaccess file in the directory where you want the footer module to be active:
SetHandler perl-scriptPerlHandler Apache::FooterA Server-Side Include System
The obvious limitation of the Apache::Footer example is that the footer text is hardcoded into the code. Changing the footer becomes a nontrivial task, and using different footers for various parts of the site becomes impractical. A much more flexible solution is provided by Vivek Khera's Apache::Sandwich module. This module "sandwiches" HTML pages between canned headers and footers that are determined by runtime configuration directives. The Apache::Sandwich module also avoids the overhead of parsing the request document; it simply uses the subrequest mechanism to send the header, body, and footer files in sequence.
We can provide more power than Apache::Sandwich by using server-side includes. Server-side includes are small snippets of code embedded within HTML comments. For example, in the standard server-side includes that are implemented in Apache, you can insert the current time and date into the page with a comment that looks like this:
Today is <!--#echo var="DATE_LOCAL"-->.In this section, we use mod_perl to develop our own system of server-side includes, using a simple but extensible scheme that lets you add new types of includes at a moment's whim. The basic idea is that HTML authors will create files that contain comments of this form:
<!--#DIRECTIVE PARAM1 PARAM2 PARAM3 PARAM4...-->A directive name consists of any sequence of alphanumeric characters or underscores. This is followed by a series of optional parameters, separated by spaces or commas. Parameters that contain whitespace must be enclosed in single or double quotes in shell command style. Backslash escapes also work in the expected manner.
The directives themselves are not hardcoded into the module but are instead dynamically loaded from one or more configuration files created by the site administrator. This allows the administrator to create a standard menu of includes that are available to the site's HTML authors. Each directive is a short Perl subroutine. A simple directive looks like this one:
sub HELLO { "Hello World!"; }This defines a subroutine named HELLO( ) that returns the string "Hello World!" A document can now include the string in its text with a comment formatted like this one:
I said <!--#HELLO-->A more complex subroutine will need access to the Apache object and the server-side include parameters. To accommodate this, the Apache object is passed as the first function argument, and the server-side include parameters, if any, follow. Here's a function definition that returns any field from the incoming request's HTTP header, using the Apache object's header_in( ) method:
sub HTTP_HEADER {my ($r,$field) = @_;$r->header_in($field);}With this subroutine definition in place, HTML authors can insert the User-Agent field into their document using a comment like this one:
You are using the browser <!-- #HTTP_HEADER User-Agent -->.Example 4-2 shows an HTML file that uses a few of these includes, and Figure 4-2 shows what the page looks like after processing.
Figure 4-2. A page generated by Apache::ESSI
![]()
Implementing this type of server-side include system might seem to be something of a challenge, but in fact the code is surprisingly compact (Example 4-3). This module is named Apache::ESSI, for "extensible server-side includes."
Again, we'll step through the code one section at a time.
package Apache::ESSI;use strict;use Apache::Constants qw(:common);use Apache::File ();use Text::ParseWords qw(quotewords);my (%MODIFIED, %SUBSTITUTION);We start as before by declaring the package name and loading various Perl library modules. In addition to the modules that we loaded in the Apache::Footer example, we import the quotewords( ) function from the standard Perl Text::ParseWords module. This routine provides command shell-like parsing of strings that contain quote marks and backslash escapes. We also define two lexical variables,
%MODIFIEDand%SUBSTITUTION, which are global to the package.sub handler {my $r = shift;$r->content_type() eq 'text/html' || return DECLINED;my $fh = Apache::File->new($r->filename) || return DECLINED;my $sub = read_definitions($r) || return SERVER_ERROR;$r->send_http_header;$r->print($sub->($r, $fh));return OK;}The handler( ) subroutine is quite short. As in the Apache::Footer example, handler( ) starts by examining the content type of the document being requested and declines to handle requests for non-HTML documents. The handler recovers the file's physical path by calling the request object's filename( ) method and attempts to open it. If the file open fails, the handler again returns an error code of
DECLINED. This avoids Apache::Footer 's tedious checking of the file's existence and access permissions, at the cost of some efficiency every time a nonexistent file is requested.Once the file is opened, we call an internal function named read_definitions( ). This function reads the server-side includes configuration file and generates an anonymous subroutine to do the actual processing of the document. If an error occurs while processing the configuration file, read_definitions( ) returns undef and we return
SERVER_ERRORin order to abort the transaction. Otherwise, we send the HTTP header and invoke the anonymous subroutine to perform the substitutions on the contents of the file. The result of invoking the subroutine is sent to the client using the request object's print( ) method, and we return a result code ofOKto indicate that everything went smoothly.sub read_definitions {my $r = shift;my $def = $r->dir_config('ESSIDefs');return unless $def;return unless -e ($def = $r->server_root_relative($def));Most of the interesting work occurs in read_definitions( ). The idea here is to read the server-side include definitions, compile them, and then use them to generate an anonymous subroutine that does the actual substitutions. In order to avoid recompiling this subroutine unnecessarily, we cache its code reference in the package variable
%SUBSTITUTIONand reuse it if we can.The read_definitions( ) subroutine begins by retrieving the path to the file that contains the server-side include definitions. This information is contained in a per-directory configuration variable named
ESSIDefs, which is set in the configuration file using the PerlSetVar directive and retrieved within the handler with the request object's dir_config( ) method (see the end of the example for a representative configuration file entry). If, for some reason, this variable isn't present, we return undef. Like other Apache configuration files, we allow this file to be specified as either an absolute path or a partial path relative to the server root. We pass the path to the request object's server_root_relative( ) method. This convenient function prepends the server root to relative paths and leaves absolute paths alone. We next check that the file exists using the -e file test operator and return undef if not.return $SUBSTITUTION{$def} if $MODIFIED{$def} && $MODIFIED{$def} <= -M _;Having recovered the name of the definitions file, we next check the cache to see whether the subroutine definitions are already cached and, if so, whether the file hasn't changed since the code was compiled and cached. We use two hashes for this purpose. The
%SUBSTITUTIONarray holds the compiled code and%MODIFIEDcontains the modification date of the definition file the last time it was compiled. Both hashes are indexed by the definition file's path, allowing the module to handle the case in which several server-side include definition files are used for different parts of the document tree. If the modification time listed in%MODIFIEDis less than or equal to the definition file's current modification date, we return the cached subroutine.my $package = join "::", _ _PACKAGE_ _, $def;$package =~ tr/a-zA-Z0-9_/_/c;The next two lines are concerned with finding a unique namespace in which to compile the server-side include functions. Putting the functions in their own namespace decreases the chance that function side effects will have unwanted effects elsewhere in the module. We take the easy way out here by using the path to the definition file to synthesize a package name, which we store in a variable named
$package.eval "package $package; do '$def'";if($@) {$r->log_error("Eval of $def did not return true: $@");return;}We then invoke eval( ) to compile the subroutine definitions into the newly chosen namespace. We use the package declaration to set the namespace and do to load and run the definitions file. We use do here rather than the more common require because do unconditionally recompiles code files even if they have been loaded previously. If the eval was unsuccessful, we log an error and return undef.
$SUBSTITUTION{$def} = sub {do_substitutions($package, @_);};$MODIFIED{$def} = -M $def; # store modification datereturn $SUBSTITUTION{$def};}Before we exit read_definitions( ), we create a new anonymous subroutine that invokes the do_substitutions( ) function, store this subroutine in
%SUBSTITUTION, and update%MODIFIEDwith the modification date of the definitions file. We then return the code reference to our caller. We interpose a new anonymous subroutine here so that we can add the contents of the$packagevariable to the list of variables passed to the do_substitutions( ) function.sub do_substitutions {my $package = shift;my($r, $fh) = @_;# Make sure that eval() errors aren't trapped.local $SIG{_ _WARN_ _};local $SIG{_ _DIE_ _};local $/; #slurp $fhmy $data = <$fh>;$data =~ s/<!--\s*\#(\w+) # start of a function name\s*(.*?) # optional parameters\s*--> # end of comment/call_sub($package, $1, $r, $2)/xseg;$data;}When handler( ) invokes the anonymous subroutine, it calls do_substitutions( ) to do the replacement of the server-side include directives with the output of their corresponding routines. We start off by localizing the
$SIG{_ _WARN_ _}and$SIG{_ _DIE_ _}handlers and setting them back to the default Perl CORE::warn( ) and CORE::die( ) subroutines. This is a paranoid precaution against the use of CGI::Carp, which some mod_perl users load into Apache during the startup phase in order to produce nicely formatted server error log messages. The subroutine continues by fetching the lines of the page to be processed and joining them in a single scalar value named$data.We then invoke a string substitution function to replace properly formatted comment strings with the results of invoking the corresponding server-side include function. The substitution uses the e flag to treat the replacement part as a Perl expression to be evaluated and the g flag to perform the search and replace globally. The search half of the function looks like this:
/<!--\s*\#(\w+)\s*(.*?)\s*-->/This detects the server-side include comments while capturing the directive name in
$1and its optional arguments in$2.The replacement of the function looks like this:
/call_sub($package, $1, $r, $2)/This just invokes another utility function, call_sub( ), passing it the package name, the directive name, the request object, and the list of parameters.
sub call_sub {my($package, $name, $r, $args) = @_;my $sub = \&{join '::', $package, $name};$r->chdir_file;my $res = eval { $sub->($r, quotewords('[ ,]',0,$args)) };return "<em>[$@]</em>" if $@;return $res;}The call_sub( ) routine starts off by obtaining a reference to the subroutine using its fully qualified name. It does this by joining the package name to the subroutine name and then using the funky Perl
\&{...}syntax to turn this string into a subroutine reference. As a convenience to the HTML author, before invoking the subroutine we call the request object's chdir_file( ) method. This simply makes the current directory the same as the requested file, which in this case is the HTML file containing the server-side includes.The server-side include function is now invoked, passing it the request object and the optional arguments. We call quotewords( ) to split up the arguments on commas or whitespace. In order to trap fatal runtime errors that might occur during the function's execution, the call is done inside an eval{} block. If the call function fails, we return the error message it died with captured within
$@. Otherwise, we return the value of the call function.At the bottom of Example 4-3 is an example entry for perl.conf (or httpd.conf if you prefer). The idea here is to make Apache::ESSI the content handler for all files ending with the extension .ehtml. We do this with a <Files> configuration section that contains the appropriate SetHandler and PerlHandler directives. We use the PerlSetVar directive to point the module to the server-relative definitions file, conf/essi.defs.
In addition to the <Files> section, we need to ensure that Apache knows that .ehtml files are just a special type of HTML file. We use AddType to tell Apache to treat .ehtml files as MIME type text/html.
You could also use <Location> or <Directory> to assign the Apache::ESSI content handler to a section of the document tree, or a different <Files> directive to make Apache::ESSI the content handler for all HTML files.
Here are some perl.conf directives to go with Apache::ESSI :
<Files ~ "\.ehtml$">SetHandler perl-scriptPerlHandler Apache::ESSIPerlSetVar ESSIDefs conf/essi.defs</Files>AddType text/html .ehtmlAt this point you'd probably like a complete server-side include definitions file to go with the module. Example 4-4 gives a short file that defines a core set of functions that you can build on top of. Among the functions defined here are ones for inserting the size and modification date of the current file, the date, fields from the browser's HTTP request header, and a function that acts like the C preprocessor #include macro to insert the contents of a file into the current document. There's also an include called OOPS which divides the number 10 by the argument you provide. Pass it an argument of zero to see how runtime errors are handled.
The INCLUDE( ) function inserts whole files into the current document. It accepts either a physical pathname or a "virtual" path in URI space. A physical path is only allowed if it lives in or below the current directory. This is to avoid exposing sensitive files such as /etc/passwd.
If the
$virtualflag is passed, the function translates from URI space to a physical path name using the lookup_uri( ) and filename( ) methods:$file = $r->lookup_uri($path)->filename;The request object's lookup_uri( ) method creates an Apache subrequest for the specified URI. During the subrequest, Apache does all the processing that it ordinarily would on a real incoming request up to, but not including, activating the content handler. lookup_uri( ) returns an Apache::SubRequest object, which inherits all its behavior from the Apache request class. We then call this object's filename( ) method in order to retrieve its translated physical file name.
Example 4-4: Server-Side Include Function Definitions
# Definitions for server-side includes. # This file is require'd, and therefore must end with # a true value. use Apache::File (); use Apache::Util qw(ht_time size_string); # insert the string "Hello World!" sub HELLO { my $r = shift; "Hello World!"; } # insert today's date possibly modified by a strftime() format # string sub DATE { my ($r,$format) = @_; return scalar(localtime) unless $format; return ht_time(time, $format, 0); } # insert the modification time of the document, possibly modified # by a strftime() format string. sub MODTIME { my ($r,$format) = @_; my $mtime = (stat $r->finfo)[9]; return localtime($mtime) unless $format; return ht_time($mtime, $format, 0); } # insert the size of the current document sub FSIZE { my $r = shift; return size_string -s $r->finfo; } # divide 10 by the argument (used to test runtime error trapping) sub OOPS { 10/$_[1]; } # insert a canned footer sub FOOTER { my $r = shift; my $modtime = MODTIME($r); return <<END; <hr> © 2001 <a href="http://www.ora.com/">O'Reilly & Associates</a><br> <em>Last Modified: $modtime</em> END } # insert the named field from the incoming request sub HTTP_HEADER { my ($r,$h) = @_; $r->header_in($h); } #ensure that path is relative, and does not contain ".." sub is_below_only { $_[0] !~ m:(^/|(^|/)\.\.(/|$)): } # Insert the contents of a file. If the $virtual flag is set # does a document-root lookup, otherwise treats filename as a # physical path. sub INCLUDE { my ($r,$path,$virtual) = @_; my $file; if($virtual) { $file = $r->lookup_uri($path)->filename; } else { unless(is_below_only($path)) { die "Can't include $path\n"; } $file = $path; } my $fh = Apache::File->new($file) || die "Couldn't open $file: $!\n"; local $/; return <$fh>; } 1;If you're a fan of server-side includes, you should also check out the Apache Embperl and ePerl packages. Both packages, along with several others available from the CPAN, build on mod_perl to create a Perl-like programming language embedded entirely within server-side includes.
Converting Image Formats
Another useful application of Apache content handlers is converting file formats on the fly. For example, with a little help from the Aladdin Ghostscript interpreter, you can dynamically convert Adobe Acrobat (PDF) files into GIF images when dealing with a browser that doesn't have the Acrobat plug-in installed.[[1]]
In this section, we show a content handler that converts image files on the fly. It takes advantage of Kyle Shorter's Image::Magick package, the Perl interface to John Cristy's ImageMagick library. Image::Magick interconverts a large number of image formats, including JPEG, PNG, TIFF, GIF, MPEG, PPM, and even PostScript. It can also transform images in various ways, such as cropping, rotating, solarizing, sharpening, sampling, and blurring.
The Apache::Magick content handler accepts URIs in this form:
/path/to/image.ext/Filter1/Filter2?arg=value&arg=value...In its simplest form, the handler can be used to perform image format conversions on the fly. For example, if the actual file is named bluebird.gif and you request bluebird.jpg, the content handler automatically converts the GIF into a JPEG file and returns it. You can also pass arguments to the converter in the query string. For example, to specify a progressive JPEG image (
interlace="Line") with a quality of 50 percent, you can fetch the file by requesting a URI like this one:/images/bluebird.jpg?interlace=Line&quality=50You can also run one or more filters on the image prior to the conversion. For example, to apply the "Charcoal" filter (which makes the image look like a charcoal sketch) and then put a decorative border around it (the "Frame" filter), you can request the image like this:
/images/bluebird.jpg/Charcoal/Frame?quality=75Any named arguments that need to be passed to the filter can be appended to the query string, along with the conversion arguments. In the last example, we can specify a gold-colored frame this way:
/images/bluebird.jpg/Charcoal/Frame?quality=75&color=goldThis API doesn't allow you to direct arguments to specific filters. Fortunately, most of the filters that you might want to apply together don't have overlapping argument names, and filters ignore any arguments that don't apply to them. The full list of filters and conversion operations can be found at the PerlMagick web site, located at http://www.wizards.dupont.com/cristy/www/perl.html. You'll find pointers to the latest ImageMagick code library there as well.
One warning before you use this Apache module on your system: some of the operations can be very CPU-intensive, particularly when converting an image with many colors, such as JPEG, to one that has few colors, such as GIF. You should also be prepared for Image::Magick 's memory consumption, which is nothing short of voracious.
Example 4-5 shows the code for Apache::Magick.
package Apache::Magick;use strict;use Apache::Constants qw(:common);use Image::Magick ();use Apache::File ();use File::Basename qw(fileparse);use DirHandle ();We begin as usual by bringing in the modules we need. We bring in Apache::Constants, File::Basename for its file path parsing utilities, DirHandle( ) for object-oriented interface to directory reading functions, and the Image::Magick module itself.
my %LegalArguments = map { $_ => 1 }qw (adjoin background bordercolor colormap colorspacecolors compress density dispose delay ditherdisplay font format iterations interlaceloop magick mattecolor monochrome page pointsizepreview_type quality scene subimage subrangesize tile texture treedepth undercolor);my %LegalFilters = map { $_ => 1 }qw(AddNoise Blur Border Charcoal ChopContrast Crop Colorize Comment CycleColormapDespeckle Draw Edge Emboss Enhance Equalize Flip FlopFrame Gamma Implode Label Layer Magnify Map MinifyModulate Negate Normalize OilPaint Opaque QuantizeRaise ReduceNoise Rotate Sample Scale Segment ShadeSharpen Shear Solarize Spread Swirl Texture TransparentThreshold Trim Wave Zoom);We then define two hashes, one for all the filter and conversion arguments recognized by Image::Magick and the other for the various filter operations that are available. These lists were cut and pasted from the Image::Magick documentation. We tried to exclude the ones that were not relevant to this module, such as ones that create multiframe animations, but a few may have slipped through.
sub handler {my $r = shift;# get the name of the requested filemy $file = $r->filename;# If the file exists and there are no transformation arguments# just decline the transaction. It will be handled as usual.return DECLINED unless $r->args || $r->path_info || !-r $r->finfo;The handler( ) routine begins as usual by fetching the name of the requested file. We decline to handle the transaction if the file exists, the query string is empty, and the additional path information is empty as well. This is just the common case of the browser trying to fetch an unmodified existing file.
my $source;my ($base, $directory, $extension) = fileparse($file, '\.\w+');if (-r $r->finfo) { # file exists, so it becomes the source$source = $file;}else { # file doesn't exist, so we search for itreturn DECLINED unless -r $directory;$source = find_image($r, $directory, $base);}unless ($source) {$r->log_error("Couldn't find a replacement for $file");return NOT_FOUND;}We now use File::Basename 's fileparse( ) function to parse the requested file into its basename (the filename without the extension), the directory name, and the extension. We check again whether we can read the file, and if so it becomes the source for the conversion. Otherwise, we search the directory for another image file to convert into the format of the requested file. For example, if the URI requested is bluebird.jpeg and we find a file named bluebird.gif, we invoke Image::Magick to do the conversion. The search is done by an internal subroutine named find_image( ), which we'll examine later. If successful, the name of the source image is stored in
$source. If unsuccessful, we log the error with the log_error( ) function and return aNOT_FOUNDresult code.$r->send_http_header;return OK if $r->header_only;At this point, we send the HTTP header using send_http_header( ). The next line represents an optimization that we haven't seen before. It may be that the client isn't interested in the content of the image file, but just in its meta-information, such as its length and MIME type. In this case, the browser sends an HTTP HEAD request rather than the usual GET. When Apache receives a HEAD request, it sets header_only( ) to true. If we see that this has happened, we return from the handler immediately with an
OKstatus code. Although it wouldn't hurt to send the document body anyway, respecting the HEAD request results in a slight savings in processing efficiency and makes the module compliant with the HTTP protocol.my $q = Image::Magick->new;my $err = $q->Read($source);Otherwise, it's time to read the source image into memory. We create a new Image::Magick object, store it in a variable named
$q, and then load the source image file by calling its Read( ) method. Any error message returned by Read( ) is stored into a variable called$err.my %arguments = $r->args;# Run the filtersfor (split '/', $r->path_info) {my $filter = ucfirst $_;next unless $LegalFilters{$filter};$err ||= $q->$filter(%arguments);}# Remove invalid arguments before the conversionfor (keys %arguments) {delete $arguments{$_} unless $LegalArguments{$_};}The next phase of the process is to prepare for the image manipulation. The first thing we do is tidy up the input parameters. We retrieve the query string parameters by calling the request object's args( ) method and store them in a hash named
%arguments.We then call the request object's path_info( ) method to retrieve the additional path information. We split the path info into a series of filter names and canonicalize them by capitalizing their initial letters using the Perl built-in operator ucfirst( ). Each of the filters is applied in turn, skipping over any that aren't on the list of filters that Image::Magick accepts. We do an OR assignment into
$err, so that we maintain the first non-null error message, if any. Having run the files, we remove from the%argumentsarray any arguments that aren't valid in Image::Magick 's file format conversion calls.# Create a temporary file name to use for conversionmy($tmpnam, $fh) = Apache::File->tmpfile;Image::Magick needs to write the image to a temporary file. We call the Apache::File tmpfile( ) method to create a suitable temporary file name. If successful, tmpfile( ) returns the name of the temporary file, which we store in the variable
$tmpnam, and a filehandle open for writing into the file, which we store in the variable$fh. The tmpfile( ) method is specially written to avoid a "race condition" in which the temporary file name appears to be unused when the module first checks for it but is created by someone else before it can be opened.# Write out the modified imageopen(STDOUT, ">&=" . fileno($fh));The next task is to have Image::Magick perform the requested conversion and write it to the temporary file. The safest way to do this would be to pass it the temporary file's already opened filehandle. Unfortunately, Image::Magick doesn't accept filehandles; its Write( ) method expects a filename, or the special filename
-to write to standard output. However, we can trick it into writing to the filehandle by reopening standard output on the filehandle, which we do by passing the filehandle's numeric file descriptor to open( ) using the rarely seen>&=notation. See the open( ) entry in the perlfunc manual page for complete details.Since STDOUT gets reset before every Perl API transaction, there's no need to save and restore its original value.
$extension =~ s/^\.//;$err ||= $q->Write('filename' => "\U$extension\L:-", %arguments);if ($err) {unlink $tmpnam;$r->log_error($err);return SERVER_ERROR;}close $fh;We now call Image::Magick 's Write( ) method with the argument
'filename'=>EXTENSION:-where EXTENSION is the uppercased extension of the document that the remote user requested. We also tack on any conversion arguments that were requested. For example, if the remote user requestedbluebird.jpg?quality=75, the call to Write( ) ends up looking like this:$q->Write('filename'=>'JPG:-','quality'=>75);If any errors occurred during this step or the previous ones, we delete the temporary file, log the errors, and return a
SERVER_ERRORstatus code.# At this point the conversion is all done!# reopen for reading$fh = Apache::File->new($tmpnam);unless ($fh) {$r->log_error("Couldn't open $tmpnam: $!");return SERVER_ERROR;}# send the file$r->send_fd($fh);# clean up and gounlink $tmpnam;return OK;}If the call to Write( ) was successful, we need to send the contents of the temporary file to the waiting browser. We could open the file, read its contents, and send it off using a series of print( ) calls, as we've done previously, but in this case there's a slightly easier way. After reopening the file with Apache::File 's new( ) method, we call the request object's send_fd( ) method to transmit the contents of the filehandle in one step. The send_fd( ) method accepts all the same filehandle data types as the Perl built-in I/O operators. After sending off the file, we clean up by unlinking the temporary file and returning an
OKstatus.We'll now turn our attention to the find_image( ) subroutine, which is responsible for searching the directory for a suitable file to use as the image source if the requested file can't be found:
sub find_image {my ($r, $directory, $base) = @_;my $dh = DirHandle->new($directory) or return;The find_image( ) utility subroutine is straightforward. It takes the request object, the parsed directory name, and the basename of the requested file and attempts to search this directory for an image file that shares the same basename. The routine opens a directory handle with DirHandle->new( ) and iterates over its entries.
my $source;for my $entry ($dh->read) {my $candidate = fileparse($entry, '\.\w+');if ($base eq $candidate) {# determine whether this is an image file$source = join '', $directory, $entry;my $subr = $r->lookup_file($source);last if $subr->content_type =~ m:^image/:;undef $source;}}For each entry in the directory listing, we parse out the basename using fileparse( ). If the basename is identical to the one we're searching for, we call the request object's lookup_file( ) method to activate an Apache subrequest. lookup_file( ) is similar to lookup_uri( ), which we saw earlier in the context of server-side includes, except that it accepts a physical pathname rather than a URI. Because of this, lookup_file( ) will skip the URI translation phase, but it will still cause Apache to trigger all the various handlers up to, but not including, the content handler.
In this case, we're using the subrequest for the sole purpose of getting at the MIME type of the file. If the file is indeed an image of one sort or another, then we save the request in a lexical variable and exit the loop. Otherwise, we keep searching.
$dh->close;return $source;}At the end of the loop,
$sourcewill be undefined if no suitable image file was found, or it will contain the full pathname to the image file if we were successful. We close the directory handle, and return$source.Here is a perl.conf entry to go with Apache::Magick :
<Location /images>SetHandler perl-scriptPerlHandler Apache::Magick</Location>A Dynamic Navigation Bar
Many large web sites use a navigation bar to help users find their way around the main subdivisions of the site. Simple navigation bars are composed entirely of link text, while fancier ones use inline images to create the illusion of a series of buttons. Some sites use client-side Java, JavaScript, or frames to achieve special effects like button "rollover," in which the button image changes when the mouse passes over it. Regardless of the technology used to display the navigation bar, they can be troublesome to maintain. Every time you add a new page to the site, you have to remember to insert the correct HTML into the page to display the correct version of the navigation bar. If the structure of the site changes, you might have to manually update dozens or hundreds of HTML files.
Apache content handlers to the rescue. In this section, we develop a navigation bar module called Apache::NavBar. When activated, this module automatically adds a navigation bar to the tops and bottoms of all HTML pages on the site. Each major content area of the site is displayed as a hypertext link. When an area is "active" (the user is viewing one of the pages contained within it), its link is replaced with highlighted text (see Figure 4-3).
Figure 4-3. The navigation bar at the top of this page was generated dynamically by Apache::NavBar.
![]()
In this design, the navigation bar is built dynamically from a configuration file. Here's the one that Lincoln uses at his laboratory's site at http://stein.cshl.org :
# Configuration file for the navigation bar/index.html Home/jade/ Jade/AcePerl/ AcePerl/software/boulder/ BoulderIO/software/WWW/ WWW/linux/ LinuxThe right column of this configuration file defines six areas named "Home," "Jade," "AcePerl," "BoulderIO," "WWW," and "Linux" (the odd names correspond to various software packages). The left column defines the URI that each link corresponds to. For example, selecting the "Home" link takes the user to /index.html. These URIs are also used by the navigation bar generation code to decide when to display an area as active. In the example above, any page that starts with /linux/ is considered to be part of the "Linux" area and its label will be appropriately highlighted. In contrast, since /index.html refers to a file rather than a partial path, only the home page itself is considered to be contained within the "Home" area.
Example 4-6 gives the complete code for Apache::NavBar. At the end of the example is a sample entry for perl.conf (or httpd.conf if you prefer) which activates the navigation bar for the entire site.
package Apache::NavBar;# file Apache/NavBar.pmuse strict;use Apache::Constants qw(:common);use Apache::File ();my %BARS = ();my $TABLEATTS = 'WIDTH="100%" BORDER=1';my $TABLECOLOR = '#C8FFFF';my $ACTIVECOLOR = '#FF0000';The preamble brings in the usual modules and defines some constants that will be used later in the code. Among the constants are ones that control the color and size of the navigation bar.
sub handler {my $r = shift;my $bar = read_configuration($r) || return DECLINED;The handler( ) function starts by calling an internal function named read_configuration( ), which, as its name implies, parses the navigation bar configuration file. If successful, the function returns a custom-designed NavBar object that implements the methods we need to build the navigation bar on the fly. As in the server-side includes example, we cache NavBar objects in the package global
%BARSand only re-create them when the configuration file changes. The cache logic is all handled internally by read_configuration( ).If, for some reason, read_configuration( ) returns an undefined value, we decline the transaction by returning
DECLINED. Apache will display the page, but the navigation bar will be missing.$r->content_type eq 'text/html' || return DECLINED;my $fh = Apache::File->new($r->filename) || return DECLINED;As in the server-side include example, we check the MIME type of the requested file. If it isn't of type text/html, then we can't add a navigation bar to it and we return
DECLINEDto let Apache take its default actions. Otherwise, we attempt to open the file by calling Apache::File 's new( ) method. If this fails, we again returnDECLINEDto let Apache generate the appropriate error message.my $navbar = make_bar($r, $bar);Having successfully processed the configuration file and opened the requested file, we call an internal subroutine named make_bar( ) to create the HTML text for the navigation bar. We'll look at this subroutine momentarily. This fragment of HTML is stored in a variable named
$navbar.$r->send_http_header;return OK if $r->header_only;local $/ = "";while (<$fh>) {s:(</BODY>):$navbar$1:i;s:(<BODY.*?>):$1$navbar:si;} continue {$r->print($_);}return OK;}The remaining code should look familiar. We send the HTTP header and loop through the text in paragraph-style chunks looking for all instances of the <BODY> and </BODY> tags. When we find either tag we insert the navigation bar just below or above it. We use paragraph mode (by setting
$/to the empty string) in order to catch documents that have spread the initial <BODY> tag among multiple lines.sub make_bar {my($r, $bar) = @_;# create the navigation barmy $current_url = $r->uri;my @cells;The make_bar( ) function is responsible for generating the navigation bar HTML code. First, it recovers the current document's URI by calling the Apache request object's uri( ) method. Next, it calls $bar->urls( ) to fetch the list of partial URIs for the site's major areas and iterates over the areas in a for( ) loop:
for my $url ($bar->urls) {my $label = $bar->label($url);my $is_current = $current_url =~ /^$url/;my $cell = $is_current ?qq(<FONT COLOR="$ACTIVECOLOR">$label</FONT>): qq(<A HREF="$url">$label</A>);push @cells,qq(<TD CLASS="navbar" ALIGN=CENTER BGCOLOR="$TABLECOLOR">$cell</TD>\n);}For each URI, the code fetches its human-readable label by calling $bar->label( ) and determines whether the current document is part of the area using a pattern match. What happens next depends on whether the current document is part of the area or not. In the former case, the code generates a label enclosed within a <FONT> tag with the COLOR attribute set to red. In the latter case, the code generates a hypertext link. The label or link is then pushed onto a growing array of HTML table cells.
return qq(<TABLE $TABLEATTS><TR>@cells</TR></TABLE>\n);}At the end of the loop, the code incorporates the table cells into a one-row table and returns the HTML to the caller.
We next look at the read_configuration( ) function:
sub read_configuration {my $r = shift;my $conf_file;return unless $conf_file = $r->dir_config('NavConf');return unless -e ($conf_file = $r->server_root_relative($conf_file));Potentially there can be several configuration files, each one for a different part of the site. The path to the configuration file is specified by a per-directory Perl configuration variable named NavConf. We retrieve the path to the configuration file with dir_config( ), convert it into an absolute path name with server_root_relative( ), and test that the file exists with the -e operator.
my $mod_time = (stat _)[9];return $BARS{$conf_file} if $BARS{$conf_file}&& $BARS{$conf_file}->modified >= $mod_time;return $BARS{$conf_file} = NavBar->new($conf_file);}Because we don't want to reparse the configuration each time we need it, we cache the NavBar object in much the same way we did with the server-side include example. Each NavBar object has a modified( ) method that returns the time that its configuration file was modified. The NavBar objects are held in a global cache named
%BARSand indexed by the name of the configuration files. The next bit of code calls stat( ) to return the configuration file's modification time--notice that we can stat( ) the_filehandle because the foregoing -e operation will have cached its results. We then check whether there is already a ready-made NavBar object in the cache, and if so, whether its modification date is not older than the configuration file. If both tests are true, we return the cached object; otherwise, we create a new one by calling the NavBar new( ) method.You'll notice that we use a different technique for finding the modification date here than we did in Apache::ESSI (Example 4-3). In the previous example, we used the -M file test flag, which returns the relative age of the file in days since the Perl interpreter was launched. In this example, we use stat( ) to determine the absolute age of the file from the filesystem timestamp. The reason for this will become clear later, when we modify the module to handle If-Modified-Since caching.
Toward the bottom of the example is the definition for the NavBar class. It defines three methods named new( ), urls( ), and label( ) :
package NavBar;# create a new NavBar objectsub new {my ($class,$conf_file) = @_;my (@c,%c);my $fh = Apache::File->new($conf_file) || return;while (<$fh>) {chomp;s/^\s+//; s/\s+$//; # fold leading and trailing whitespacenext if /^#/ || /^$/; # skip comments and empty linesnext unless my($url, $label) = /^(\S+)\s+(.+)/;push @c, $url; # keep the url in an ordered array$c{$url} = $label; # keep its label in a hash}return bless {'urls' => \@c,'labels' => \%c,'modified' => (stat $conf_file)[9]}, $class;}The new( ) method is called to parse a configuration file and return a new NavBar object. It opens up the indicated configuration file, splits each row into the URI and label parts, and stores the two parts into a hash. Since the order in which the various areas appear in the navigation bar is significant, this method also saves the URIs to an ordered array.
# return ordered list of all the URIs in the navigation barsub urls { return @{shift->{'urls'}}; }# return the label for a particular URI in the navigation barsub label { return $_[0]->{'labels'}->{$_[1]} || $_[1]; }# return the modification date of the configuration filesub modified { return $_[0]->{'modified'}; }1;The urls( ) method returns the ordered list of areas, and the label( ) method uses the NavBar object's hash to return the human-readable label for the given URI. If none is defined, it just returns the URL. modified( ) returns the modification time of the configuration file.
A configuration file section to go with Apache::NavBar might read:
<Location />SetHandler perl-scriptPerlHandler Apache::NavBarPerlSetVar NavConf conf/navigation.conf</Location>Because so much of what Apache::NavBar and Apache:ESSI do is similar, you might want to merge the navigation bar and server-side include examples. This is just a matter of cutting and pasting the navigation bar code into the server-side function definitions file and then writing a small stub function named NAVBAR( ). This stub function will call the subroutines that read the configuration file and generate the navigation bar table. You can then incorporate the appropriate navigation bar into your pages anywhere you like with an include like this one:
<!--#NAVBAR-->Handling If-Modified-Since
One of us (Lincoln) thought the virtual navigation bar was so neat that he immediately ran out and used it for all documents on his site. Unfortunately, he had some pretty large (>400 MB) files there, and he soon noticed something interesting. Before installing the navigation bar handler, browsers would cache the large HTML files locally and only download them again when they had changed. After installing the handler, however, the files were always downloaded. What happened?
When a browser is asked to display a document that it has cached locally, it sends the remote server a GET request with an additional header field named If-Modified-Since. The request looks something like this:
GET /index.html HTTP/1.0If-Modified-Since: Tue, 24 Feb 1998 11:19:03 GMTUser-Agent: (etc. etc. etc.)The server will compare the document's current modification date to the time given in the request. If the document is more recent than that, it will return the whole document. Otherwise, the server will respond with a 304 "not modified" message and the browser will display its cached copy. This reduces network bandwidth usage dramatically.
When you install a custom content handler, the If-Modified-Since mechanism no longer works unless you implement it. In fact, you can generally ignore If-Modified-Since because content handlers usually generate dynamic documents that change from access to access. However, in some cases the content you provide is sufficiently static that it pays to cache the documents. The navigation bar is one such case because even though the bar is generated dynamically, it rarely changes from day to day.
In order to handle If-Modified-Since caching, you have to settle on a definition for the document's most recent modification date. In the case of a static document, this is simply the modification time of the file. In the case of composite documents that consist equally of static file content and a dynamically generated navigation bar, the modification date is either the time that the HTML file was last changed or the time that the navigation bar configuration file was changed, whichever happens to be more recent. Fortunately for us, we're already storing the configuration file's modification date in the NavBar object, so finding this aggregate modification time is relatively simple.
To use these routines, simply add the following just before the call to
$r->send_http_headerin the handler( ) subroutine:$r->update_mtime($bar->modified);$r-.set_last_modified;my $rc = $r-> meets_conditionsreturn $rc unless $rc == OK;We first call the update_mtime( ) function with the navigation bar's modification date. This function will compare the specified date with the modification date of the request document and update the request's internal
mtimefield to the most recent of the two. We then call set_last_modified( ) to copy themtimefield into the outgoing Last-Modified header. If a synthesized document depends on several configuration files, you should call update_mtime( ) once for each configuration file, followed by set_last_modified( ) at the very end.The complete code for the new and improved Apache::NavBar, with the If-Modified-Since improvements, can be found at this book's companion web site.
If you think carefully about this module, you'll see that it still isn't strictly correct. There's a third modification date that we should take into account, that of the module source code itself. Changes to the source code may affect the appearance of the document without changing the modification date of either the configuration file or the HTML file. We could add a new update_mtime( ) with the modification time of the Apache::NavBar module, but then we'd have to worry about modification times of libraries that Apache::NavBar depends on, such as Apache::File. This gets hairy very quickly, which is why caching becomes a moot issue for any dynamic document much more complicated than this one. See "The Apache::File Class" in Chapter 9, Perl API Reference Guide, for a complete rundown of the methods that are available to you for controlling HTTP/1.1 caching.
Sending Static Files
If you want your content handler to send a file through without modifying it, the easiest way is to let Apache do all the work for you. Simply return
DECLINEDfrom your handler (before you send the HTTP header or the body) and the request will fall through to Apache's default handler. This is a lot easier, not to mention faster, than opening up the file, reading it line by line, and transmitting it unchanged. In addition, Apache will automatically handle a lot of the details for you, first and foremost of which is handling the If-Modified-Since header and other aspects of client-side caching.If you have a compelling reason to send static files manually, see Using Apache::File to Send Static Files in Chapter 9 for a full description of the technique. Also see "Redirection," later in this chapter, for details on how to direct the browser to request a different URI or to make Apache send the browser a different document from the one that was specifically requested.
Virtual Documents
The previous sections of this chapter have been concerned with transforming existing files. Now we turn our attention to spinning documents out of thin air. Despite the fact that these two operations seem very different, Apache content handlers are responsible for them both. A content handler is free to ignore the translation of the URI that is passed to it. Apache neither knows nor cares that the document produced by the content handler has no correspondence to a physical file.
We've already seen an Apache content handler that produces a virtual document. Chapter 2, A First Module, gave the code for Apache::Hello, an Apache Perl module that produces a short HTML document. For convenience, we show it again in Example 4-7. This content handler is essentially identical to the previous content handlers we've seen. The main difference is that the content handler sets the MIME content type itself, calling the request object's content_type( ) method to set the MIME type to type text/html. This is in contrast to the idiom we used earlier, where the handler allowed Apache to choose the content type for it. After this, the process of emitting the HTTP header and the document itself is the same as we've seen before.
After setting the content type, the handler calls send_http_header( ) to send the HTTP header to the browser, and immediately exits with an
OKstatus code if header_only( ) returns true (this is a slight improvement over the original Chapter 2 version of the program). We call get_remote_host( ) to get the DNS name of the remote host machine, and incorporate the name into a short HTML document that we transmit using the request object's print( ) method. At the end of the handler, we returnOK.There's no reason to be limited to producing virtual HTML documents. You can just as easily produce images, sounds, and other types of multimedia, provided of course that you know how to produce the file format that goes along with the MIME type.
Redirection
Instead of synthesizing a document, a content handler has the option of redirecting the browser to fetch a different URI using the HTTP redirect mechanism. You can use this facility to randomly select a page or picture to display in response to a URI request (many banner ad generators work this way) or to implement a custom navigation system.
Redirection is extremely simple with the Apache API. You need only add a Location field to the HTTP header containing the full or partial URI of the desired destination, and return a
REDIRECTresult code. A complete functional example using mod_perl is only a few lines (Example 4-8). This module, named Apache::GoHome, redirects users to the hardcoded URI http://www.ora.com/. When the user selects a document or a portion of the document tree that this content handler has been attached to, the browser will immediately jump to that URI.The module begins by importing the
REDIRECTerror code from Apache::Constants (REDIRECTisn't among the standard set of result codes imported with :common). The handler( ) method then adds the desired location to the outgoing headers by calling Apache::header_out( ). header_out( ) can take one or two arguments. Called with one argument, it returns the current value of the indicated HTTP header field. Called with two arguments, it sets the field indicated by the first argument to the value indicated by the second argument. In this case, we use the two-argument form to set the HTTP Location field to the desired URI.The final step is to return the
REDIRECTresult code. There's no need to generate an HTML body, since most HTTP-compliant browsers will take you directly to the Location URI. However, Apache adds an appropriate body automatically in order to be HTTP-compliant. You can see the header and body message using telnet:% telnet localhost 80Trying 127.0.0.1...Connected to localhost.Escape character is '^]'.GET /gohome HTTP/1.0HTTP/1.1 302 Moved TemporarilyDate: Mon, 05 Oct 1998 22:15:17 GMTServer: Apache/1.3.3-dev (Unix) mod_perl/1.16Location: http://www.ora.com/Connection: closeContent-Type: text/html<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"><HTML><HEAD><TITLE>302 Moved Temporarily</TITLE></HEAD><BODY><H1>Moved Temporarily</h2>The document has moved <A HREF="http://www.ora.com/">here</A>.<P></BODY></HTML>Connection closed by foreign host.You'll notice from this example that the
REDIRECTstatus causes a "Moved Temporarily" message to be issued. This is appropriate in most cases because it makes no warrants to the browser that it will be redirected to the same location the next time it tries to fetch the desired URI. If you wish to redirect permanently, you should use theMOVEDstatus code instead, which results in a "301 Moved Permanently" message. A smart browser might remember the redirected URI and fetch it directly from its new location the next time it's needed.As a more substantial example of redirection in action, consider Apache::RandPicture (Example 4-9) which randomly selects a different image file to display each time it's called. It works by selecting an image file from among the contents of a designated directory, then redirecting the browser to that file's URI. In addition to demonstrating a useful application of redirection, it again shows off the idiom for interconverting physical file names and URIs.
The handler begins by fetching the name of a directory to fetch the images from, which is specified in the server configuration file by the Perl variable PictureDir. Because the selected image has to be directly fetchable by the browser, the image directory must be given as a URI rather than as a physical path.
The next task is to convert the directory URI into a physical directory path. The subroutine adds a
/to the end of the URI if there isn't one there already (ensuring that Apache treats the URI as a directory), then calls the request object's lookup_uri( ) and filename( ) methods in order to perform the URI translation steps. The code looks like this:my $subr = $r->lookup_uri($dir_uri);my $dir = $subr->filename;Now we need to obtain a listing of image files in the directory. The simple way to do this would be to use the Perl glob operator, for instance:
chdir $dir;@files = <*.{jpg,gif}>;However, this technique is flawed. First off, on many systems the glob operation launches a C subshell, which sends performance plummeting and won't even work on systems without the C shell (like Win32 platforms). Second, it makes assumptions about the extension types of image files. Your site may have defined an alternate extension for image files (or may be using a completely different system for keeping track of image types, such as the Apache MIME magic module), in which case this operation will miss some images.
Instead, we create a DirHandle object using Perl's directory handle object wrapper. We call the directory handle's read( ) method repeatedly to iterate through the contents of the directory. For each item we ask Apache what it thinks the file's MIME type should be, by calling the lookup_uri( ) method to turn the filename into a subrequest and content_type( ) to fetch the MIME type information from the subrequest. We perform a pattern match on the returned type and, if the file is one of the MIME image types, add it to a growing list of image URIs. The subrequest object's uri( ) method is called to return the absolute URI for the image. The whole process looks like this:
my @files;for my $entry ($dh->read) {# get the file's MIME typemy $rr = $subr->lookup_uri($entry);my $type = $rr->content_type;next unless $type =~ m:^image/:;push @files, $rr->uri;}Note that we look up the directory entry's filename by calling the subrequest object's lookup_uri( ) method rather than using the main request object stored in
$r. This takes advantage of the fact that subrequests will look up relative paths relative to their own URI.The next step is to select a member of this list randomly, which we do using this time-tested Perl idiom:
my $lucky_one = $files[rand @files];The last step is to set the Location header to point at this file (being sure to express the location as a URI) and to return a
REDIRECTresult code. If you install the module using the sample configuration file and <IMG> tag shown at the bottom of the listing, a different picture will be displayed every time you load the page.A configuration section to go with Apache::RandPicture might be:
<Location /random/picture>SetHandler perl-scriptPerlHandler Apache::RandPicturePerlSetVar PictureDir /banners</Location>And you'd use it in an HTML document like this:
<image src="/random/picture" alt="[Our Sponsor]">Although elegant, this technique for selecting a random image file suffers from a bad performance bottleneck. Instead of requiring only a single network operation to get the picture from the server to the browser, it needs two round-trips across the network: one for the browser's initial request and redirect and one to fetch the image itself.
You can eliminate this overhead in several different ways. The more obvious technique is to get rid of the redirection entirely and simply send the image file directly. After selecting the random image and placing it in the variable
$lucky_one, we replace the last two lines of the handler( ) subroutine with code like this:$subr = $r->lookup_uri($lucky_one);$r->content_type($subr->content_type);$r->send_http_header;return OK unless $r->header_only;my $fh = Apache::File->new($subr->filename) || return FORBIDDEN;$r->send_fd($fh);We create yet another subrequest, this one for the selected image file, then use information from the subrequest to set the outgoing content type. We then open up the file and send it with the send_fd( ) method.
However, this is still a little wasteful because it requires you to open up the file yourself. A more subtle solution would be to let Apache do the work of sending the file by invoking the subrequest's run( ) method. run( ) invokes the subrequest's content handler to send the body of the document, just as if the browser had made the request itself. The code now looks like this:
my $subr = $r->lookup_uri($lucky_one);unless ($subr->status == DOCUMENT_FOLLOWS) {$r->log_error("Can't lookup file $lucky_one}: $!");return SERVER_ERROR;}$r->content_type($subr->content_type);$r->send_http_header;return OK if $r->header_only;$subr->run;return OK;We call lookup_uri( ) and check the value returned by its status( ) method in order to make sure that it is
DOCUMENT_FOLLOWS(status code 200, the same asHTTP_OK). This constant is not exported by Apache::Constants by default but has to be imported explicitly. We then set the main request's content type to the same as that of the subrequest, and send off the appropriate HTTP header. Finally, we call the subrequest's run( ) method to invoke its content handler and send the contents of the image to the browser.Internal Redirection
The two Apache::RandPicture optimizations that we showed in the previous section involve a lot of typing, and the resulting code is a bit obscure. A far more elegant solution is to let Apache do all the work for you with its internal redirection mechanism. In this scheme, Apache handles the entire redirection internally. It pretends that the web browser made the request for the new URI and sends the contents of the file, without letting the browser in on the secret. It is functionally equivalent to the solution that we showed at the end of the preceding section.
To invoke the Apache internal redirection system, modify the last two lines of Apache::RandPicture 's handler( ) subroutine to read like this:
$r->internal_redirect($lucky_one);return OK;The request object's internal_redirect( ) method takes a single argument consisting of an absolute local URI (one starting with a
/). The method does all the work of translating the URI, invoking its content handler, and returning the file contents, if any. Unfortunately internal_redirect( ) returns no result code, so there's no way of knowing whether the redirect was successful (you can't do this from a conventional redirect either). However, the call will return in any case, allowing you to do whatever cleanup is needed. You should exit the handler with a result code ofOK