O'Reilly logo

Practical mod_perl by Eric Cholet, Stas Bekman

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Exposing Apache::Registry Secrets

Let's start with some simple code and see what can go wrong with it. This simple CGI script initializes a variable $counter to 0 and prints its value to the browser while incrementing it:

#!/usr/bin/perl -w
use strict;

print "Content-type: text/plain\n\n";

my $counter = 0;

for (1..5) {
    increment_counter( );
}

sub increment_counter {
    $counter++;
    print "Counter is equal to $counter !\n";
}

When issuing a request to /perl/counter.pl or a similar script, we would expect to see the following output:

Counter is equal to 1 !
Counter is equal to 2 !
Counter is equal to 3 !
Counter is equal to 4 !
Counter is equal to 5 !

And in fact that's what we see when we execute this script for the first time. But let's reload it a few times.... After a few reloads, the counter suddenly stops counting from 1. As we continue to reload, we see that it keeps on growing, but not steadily, starting almost randomly at 10, 10, 10, 15, 20..., which makes no sense at all!

Counter is equal to 6 !
Counter is equal to 7 !
Counter is equal to 8 !
Counter is equal to 9 !
Counter is equal to 10 !

We saw two anomalies in this very simple script:

  • Unexpected increment of our counter over 5

  • Inconsistent growth over reloads

The reason for this strange behavior is that although $counter is incremented with each request, it is never reset to 0, even though we have this line:

my $counter = 0;

Doesn't this work under mod_perl?

The First Mystery: Why Does the Script Go Beyond 5?

If we look at the error_log file (we did enable warnings), we'll see something like this:

Variable "$counter" will not stay shared 
at /home/httpd/perl/counter.pl line 13.

This warning is generated when a script contains a named (as opposed to an anonymous) nested subroutine that refers to a lexically scoped (with my( )) variable defined outside this nested subroutine.

Do you see a nested named subroutine in our script? We don't! What's going on? Maybe it's a bug in Perl? But wait, maybe the Perl interpreter sees the script in a different way! Maybe the code goes through some changes before it actually gets executed? The easiest way to check what's actually happening is to run the script with a debugger.

Since we must debug the script when it's being executed by the web server, a normal debugger won't help, because the debugger has to be invoked from within the web server. Fortunately, we can use Doug MacEachern's Apache::DB module to debug our script. While Apache::DB allows us to debug the code interactively (as we will show in Chapter 21), we will use it noninteractively in this example.

To enable the debugger, modify the httpd.conf file in the following way:

PerlSetEnv PERLDB_OPTS "NonStop=1 LineInfo=/tmp/db.out AutoTrace=1 frame=2"
PerlModule Apache::DB
<Location /perl>
    PerlFixupHandler Apache::DB
    SetHandler perl-script
    PerlHandler Apache::Registry
    Options ExecCGI
    PerlSendHeader On
</Location>

We have added a debugger configuration setting using the PERLDB_OPTS environment variable, which has the same effect as calling the debugger from the command line. We have also loaded and enabled Apache::DB as a PerlFixupHandler.

In addition, we'll load the Carp module, using <Perl> sections (this could also be done in the startup.pl file):

<Perl>
    use Carp;
</Perl>

After applying the changes, we restart the server and issue a request to /perl/counter.pl, as before. On the surface, nothing has changed; we still see the same output as before. But two things have happened in the background:

  • The file /tmp/db.out was written, with a complete trace of the code that was executed.

  • Since we have loaded the Carp module, the error_log file now contains the real code that was actually executed. This is produced as a side effect of reporting the "Variable "$counter" will not stay shared at..." warning that we saw earlier.

Here is the code that was actually executed:

package Apache::ROOT::perl::counter_2epl;
use Apache qw(exit);
sub handler {
    BEGIN {
        $^W = 1;
    };
    $^W = 1;

    use strict;

    print "Content-type: text/plain\n\n";

    my $counter = 0;

    for (1..5) {
        increment_counter( );
    }

    sub increment_counter {
        $counter++;
        print "Counter is equal to $counter !\n";
    }
}

Note that the code in error_log wasn't indented—we've indented it to make it obvious that the code was wrapped inside the handler( ) subroutine.

From looking at this code, we learn that every Apache::Registry script is cached under a package whose name is formed from the Apache::ROOT:: prefix and the script's URI (/perl/counter.pl) by replacing all occurrences of / with :: and . with _2e. That's how mod_perl knows which script should be fetched from the cache on each request—each script is transformed into a package with a unique name and with a single subroutine named handler( ), which includes all the code that was originally in the script.

Essentially, what's happened is that because increment_counter( ) is a subroutine that refers to a lexical variable defined outside of its scope, it has become a closure. Closures don't normally trigger warnings, but in this case we have a nested subroutine. That means that the first time the enclosing subroutine handler( ) is called, both subroutines are referring to the same variable, but after that, increment_counter( ) will keep its own copy of $counter (which is why $counter is not shared) and increment its own copy. Because of this, the value of $counter keeps increasing and is never reset to 0.

If we were to use the diagnostics pragma in the script, which by default turns terse warnings into verbose warnings, we would see a reference to an inner (nested) subroutine in the text of the warning. By observing the code that gets executed, it is clear that increment_counter( ) is a named nested subroutine since it gets defined inside the handler( ) subroutine.

Any subroutine defined in the body of the script executed under Apache::Registry becomes a nested subroutine. If the code is placed into a library or a module that the script require( )s or use( )s, this effect doesn't occur.

For example, if we move the code from the script into the subroutine run( ), place the subroutines in the mylib.pl file, save it in the same directory as the script itself, and require( ) it, there will be no problem at all.[1] Examples Example 6-1 and Example 6-2 show how we spread the code across the two files.

Example 6-1. mylib.pl

my $counter;
sub run {
    $counter = 0;
    for (1..5) {
        increment_counter( );
    }
}
sub increment_counter {
    $counter++;
    print "Counter is equal to $counter !\n";
}
1;

Example 6-2. counter.pl

use strict;
require "./mylib.pl";
print "Content-type: text/plain\n\n";
run( );

This solution is the easiest and fastest way to solve the nested subroutine problem. All you have to do is to move the code into a separate file, by first wrapping the initial code into some function that you later call from the script, and keeping the lexically scoped variables that could cause the problem out of this function.

As a general rule, it's best to put all the code in external libraries (unless the script is very short) and have only a few lines of code in the main script. Usually the main script simply calls the main function in the library, which is often called init( ) or run( ). This way, you don't have to worry about the effects of named nested subroutines.

As we will show later in this chapter, however, this quick solution might be problematic on a different front. If you have many scripts, you might try to move more than one script's code into a file with a similar filename, like mylib.pl. A much cleaner solution would be to spend a little bit more time on the porting process and use a fully qualified package, as in Examples Example 6-3 and Example 6-4.

Example 6-3. Book/Counter.pm

package Book::Counter;

my $counter = 0;

sub run {
    $counter = 0;
    for (1..5) {
        increment_counter( );
    }
}

sub increment_counter {
    $counter++;
    print "Counter is equal to $counter !<BR>\n";
}

1;
_  _END_  _

Example 6-4. counter-clean.pl

use strict;
use Book::Counter;

print "Content-type: text/plain\n\n";
Book::Counter::run( );

As you can see, the only difference is in the package declaration. As long as the package name is unique, you won't encounter any collisions with other scripts running on the same server.

Another solution to this problem is to change the lexical variables to global variables. There are two ways global variables can be used:

  • Using the vars pragma. With the use strict 'vars' setting, global variables can be used after being declared with vars. For example, this code:

    use strict;
    use vars qw($counter $result);
    # later in the code
    $counter = 0;
    $result  = 1;

    is similar to this code if use strict is not used:

    $counter = 0;
    $result  = 1;

    However, the former style of coding is much cleaner, because it allows you to use global variables by declaring them, while avoiding the problem of misspelled variables being treated as undeclared globals.

    The only drawback to using vars is that each global declared with it consumes more memory than the undeclared but fully qualified globals, as we will see in the next item.

  • Using fully qualified variables. Instead of using $counter, we can use $Foo::counter, which will place the global variable $counter into the package Foo. Note that we don't know which package name Apache::Registry will assign to the script, since it depends on the location from which the script will be called. Remember that globals must always be initialized before they can be used.

Perl 5.6.x also introduces a third way, with the our( ) declaration. our( ) can be used in different scopes, similar to my( ), but it creates global variables.

Finally, it's possible to avoid this problem altogether by always passing the variables as arguments to the functions (see Example 6-5).

Example 6-5. counter2.pl

#!/usr/bin/perl -w
use strict;

print "Content-type: text/plain\n\n";

my $counter = 0;

for (1..5) {
    $counter = increment_counter($counter);
}

sub increment_counter {
    my $counter = shift;

    $counter++;
    print "Counter is equal to $counter !\n";

    return $counter;
}

In this case, there is no variable-sharing problem. The drawback is that this approach adds the overhead of passing and returning the variable from the function. But on the other hand, it ensures that your code is doing the right thing and is not dependent on whether the functions are wrapped in other blocks, which is the case with the Apache::Registry handlers family.

When Stas (one of the authors of this book) had just started using mod_perl and wasn't aware of the nested subroutine problem, he happened to write a pretty complicated registration program that was run under mod_perl. We will reproduce here only the interesting part of that script:

use CGI;
$q = CGI->new;
my $name = $q->param('name');
print_response( );

sub print_response {
    print "Content-type: text/plain\n\n";
    print "Thank you, $name!";
}

Stas and his boss checked the program on the development server and it worked fine, so they decided to put it in production. Everything seemed to be normal, but the boss decided to keep on checking the program by submitting variations of his profile using The Boss as his username. Imagine his surprise when, after a few successful submissions, he saw the response "Thank you, Stas!" instead of "Thank you, The Boss!"

After investigating the problem, they learned that they had been hit by the nested subroutine problem. Why didn't they notice this when they were trying the software on their development server? We'll explain shortly.

To conclude this first mystery, remember to keep the warnings mode On on the development server and to watch the error_log file for warnings.

The Second Mystery—Inconsistent Growth over Reloads

Let's return to our original example and proceed with the second mystery we noticed. Why have we seen inconsistent results over numerous reloads?

What happens is that each time the parent process gets a request for the page, it hands the request over to a child process. Each child process runs its own copy of the script. This means that each child process has its own copy of $counter, which will increment independently of all the others. So not only does the value of each $counter increase independently with each invocation, but because different children handle the requests at different times, the increment seems to grow inconsistently. For example, if there are 10 httpd children, the first 10 reloads might be correct (if each request went to a different child). But once reloads start reinvoking the script from the child processes, strange results will appear.

Moreover, requests can appear at random since child processes don't always run the same requests. At any given moment, one of the children could have served the same script more times than any other, while another child may never have run it.

Stas and his boss didn't discover the aforementioned problem with the user registration system before going into production because the error_log file was too crowded with warnings continuously logged by multiple child processes.

To immediately recognize the problem visually (so you can see incorrect results), you need to run the server as a single process. You can do this by invoking the server with the -X option:

panic% httpd -X

Since there are no other servers (children) running, you will get the problem report on the second reload.

Enabling the warnings mode (as explained earlier in this chapter) and monitoring the error_log file will help you detect most of the possible errors. Some warnings can become errors, as we have just seen. You should check every reported warning and eliminate it, so it won't appear in error_log again. If your error_log file is filled up with hundreds of lines on every script invocation, you will have difficulty noticing and locating real problems, and on a production server you'll soon run out of disk space if your site is popular.



[1] Don't forget the 1; at the end of the library, or the require( ) call might fail.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required