O'Reilly logo

Mastering Perl by brian d foy

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 9. Dynamic Subroutines

For the purposes of this chapter, I’m going to label as “dynamic subroutines” anything I don’t explicitly name by typing sub some_name or that doesn’t exist until runtime. Perl is extremely flexible in letting me figure out the code as I go along, and I can even have code that writes code. I’m going to lump a bunch of different subroutine topics in this chapter just because there’s no good home for them apart from each other.

We first showed anonymous subroutines in Learning Perl when we showed user-defined sorting, although we didn’t tell you that they were anonymous subroutines. In Intermediate Perl we used them to create closures, work with map and grep, and a few other things. I’ll pick up where Intermediate Perl left off to show just how powerful they can be. With any of these tricks, not knowing everything ahead of time can be very liberating.

Subroutines As Data

I can store anonymous subroutines in variables. They don’t actually execute until I tell them to. Instead of storing values, I store behavior. This anonymous subroutine adds its first two arguments and returns the result, but it won’t do that until I execute it. I merely define the subroutine and store it in $add_sub:

my $add_sub = sub { $_[0] + $_[1] };

This way, I can decide what to do simply by choosing the variable that has the behavior that I want. A simple-minded program might do this with a series of if-elsif tests and branches because it needs to hardcode a branch for each possible subroutine call. Here I create a little calculator to handle basic arithmetic. It takes three arguments on the command line and does the calculation. Each operation gets its own branch of code:

#!/usr/bin/perl
# basic-arithmetic.pl

use strict;

while( 1 )
        {
        my( $operator, @operand ) = get_line();

        if(    $operator eq '+' ) { add(      @operand ) }
        elsif( $operator eq '-' ) { subtract( @operand ) }
        elsif( $operator eq '*' ) { multiply( @operand ) }
        elsif( $operator eq '/' ) { divide(   @operand ) }
        else
                {
                print "No such operator [$operator ]!\n";
                last;
                }
        }

print "Done, exiting...\n";

sub get_line
        {
        # This could be a lot more complicated, but this isn't the point
        print "\nprompt> ";

        my $line = <STDIN>;

        $line =~ s/^\s+|\s+$//g;

        ( split /\s+/, $line )[1,0,2];
        }

sub add      { print $_[0] + $_[1] }

sub subtract { print $_[0] - $_[1] }

sub multiply { print $_[0] * $_[1] }

sub divide   { print  $_[1] ? $_[0] / $_[1] : 'NaN' }

Those branches are really just the same thing; they take the two operands, perform a calculation, and print the result. The only thing that differs in each branch is the subroutine name. If I want to add more operations, I have to add more nearly identical branches of code. Not only that, I have to add the code to the while loop, obscuring the intent of the loop. If I decide to do things a bit differently, I have to change every branch. That’s just too much work.

I can turn that on its head so I don’t have a long series of branches to code or maintain. I want to extract the subroutine name from the branches so I can make one block of code that works for all operators. Ideally, the while loop wouldn’t change and would just deal with the basics of getting the data and sending them to the right subroutine:

while( 1 )
        {
        my( $operator, @operand ) = get_line();

        my $some_sub = ....;

        print $some_sub->( @operands );
        }

Now the subroutine is just something stored in the variable $some_sub, so I have to decide how to get the right anonymous subroutine in there. I could use a dispatch table (a hash that stores the anonymous subroutines), and then select the subroutines by their keys. In this case, I use the operator symbol as the key. I can also catch bad input because I know which operators are valid: they are the keys of the hash.

My processing loop stays the same even if I add more operators. I also label the loop REPL (for Read-Evaluate-Print), and I’ll use that label later when I want to control the looping from one of my subroutines:

#!/usr/bin/perl
use strict;

use vars qw( %Operators );
%Operators = (
        '+'  => sub { $_[0] + $_[1] },
        '-'  => sub { $_[0] - $_[1] },
        '*'  => sub { $_[0] * $_[1] },
        '/'  => sub { $_[1] ? eval { $_[0] / $_[1] } : 'NaN' },
        );

while( 1 )
        {
        my( $operator, @operand ) = get_line();

        my $some_sub = $Operators{ $operator };
        unless( defined $some_sub )
                {
                print "Unknown operator [$operator]\n";
                last;
                }

        print $Operators{ $operator }->( @operand );
        }

print "Done, exiting...\n";


sub get_line
        {
        print "\nprompt> ";

        my $line = <STDIN>;

        $line =~ s/^\s+|\s+$//g;

        ( split /\s+/, $line )[1,0,2];
        }

If I want to add more operators, I just add new entries to the hash. I can add completely new operators, such as the % operator for modulus, or the x operator as a synonym for the * multiplication operator:

use vars qw( %Operators );
%Operators = (
        '+' => sub { $_[0] + $_[1] },
        '-' => sub { $_[0] - $_[1] },
        '*' => sub { $_[0] * $_[1] },
        '/' => sub { eval { $_[0] / $_[1] } || 'NaN' },
        '%' => sub { $_[0] % $_[1] },
        );
$Operators{ 'x' } = $Operators{ '*' };

That’s fine and it works, but maybe I have to change my program so that instead of the normal algebraic notation I use Reverse Polish Notation (where the operands come first and the operator comes last). That’s easy to handle because I just change the way I pick the anonymous subroutine. Instead of looking at the middle argument, I look at the last argument. That all happens in my get_line subroutine. I rearrange that a bit and everything else stays the same:

sub get_line
        {
        print "\nprompt> ";

        my $line = <STDIN>;

        $line =~ s/^\s+|\s+$//g;
        my @list = split /\s+/, $line;

        unshift( @list, pop @list );

        @list;
        }

Now that I’ve done that, I can make a little change to handle more than just binary operators. If I want to handle something that takes more than two arguments, I do the same thing I just did: take the last argument and use it as the operator and pass the rest of the arguments to the subroutine. I don’t really have to change anything other than adding a new operator. I define a " operator and use the max function from List::Util to find the maximum value of all the arguments I pass to it. This is similar to the example we showed in Learning Perl to show that Perl doesn’t care how many arguments I pass to a subroutine:

%Operators = (
        # ... same stuff as before

        '"' => sub {
                my $max = shift;
                foreach ( @_ ) { $max = $_ if $_ > $max }
                $max
                },
        );

I can also handle a single operand because my code doesn’t really care how many there are, and a list of one element is just as good as any other list. Here’s the reason that I actually wrote this program. I often need to convert between number bases, or from Unix time to a time I can read:

%Operators = (
        # ... same stuff as before

        'dh' => sub { sprintf "%x",     $_[0]   },
        'hd' => sub { sprintf "%d", hex $_[0]   },
        't'  => sub { scalar localtime( $_[0] ) },
        );

Finally, how about an operator that works with 0 arguments? It’s just a degenerate case of what I already have. My previous programs didn’t have a way to stop the program. If I used those programs, I’d have to interrupt the program. Now I can add my q operator, which really isn’t an operator but a way to stop the program. I cheat a little by using last to break out of the while loop.[36]I could do anything I like, though, including exit straight away. In this case, I use last with the loop label I gave to the while:

%Operators = (
        # ... same stuff as before

        'q' => sub { last REPL },
        );

If I need more operators, I simply add them to the hash with a reference to the subroutine that implements them. I don’t have to add any logic or change the structure of the program. I just have to describe the additional feature (although the description is in code).

Creating and Replacing Named Subroutines

In the last section I stored my anonymous subroutines in a variable, but a subroutine is really just another slot in the typeglob (see Chapter 8). I can store subroutines there, too. When I assign an anonymous subroutine to a typeglob, Perl figures out to put it in the CODE slot. After that, I use the subroutine just as if I had defined it with a name:

print "Foo is defined before\n" if defined( &foo );

*foo = sub { print "Here I am!\n" };
foo();

print "Foo is defined afterward\n" if defined( &foo );

This can be useful if I need to replace some code in another module as I’ll do in Chapter 10. I don’t want to edit the other module. I’ll leave it as it is and replace the single definition I need to change. Since subroutines live in the symbol table, I can just use the full package specification to replace a subroutine:

#!/usr/bin/perl

package Some::Module;
sub bar { print "I'm in " . __PACKAGE__ . "\n" }


package main;

Some::Module::bar();

*Some::Module::bar = sub { print "Now I'm in " . __PACKAGE__ . "\n" };

Some::Module::bar();

If I run this under warnings, Perl catches my suspicious activity and complains because I really shouldn’t be doing this without a good reason:

$ perl -w replace_sub.pl
I'm in Some::Module
Subroutine Some::Module::bar redefined at replace_sub.pl line 11.
Now I'm in main

I change the code a bit to get around that warning. Instead of turning off all warnings, I isolate that bit of code with a naked block and turn off any warnings in the redefine class:

{
no warnings 'redefine';
*Some::Module::bar = sub { print "Now I'm in " . __PACKAGE__ . "\n" };
}

Although I did this with an existing subroutine definition, I can do it without a previous declaration, too. With a little modification my main package defines the new subroutine quux in Some::Module:

package Some::Module;
# has no subroutines

package main;

{
no warnings 'redefine';
*Some::Module::quux = sub { print "Now I'm in " . __PACKAGE__ . "\n" };
}

Some::Module::quux();

See anything familiar? If I change it around it might look a bit more like something you’ve seen before as a trick to import symbols into another namespace. You’ve probably been doing this same thing for quite a while without even knowing about it:

package Some::Module;

sub import
        {
        *main::quux = sub { print "I came from " . __PACKAGE__ . "\n" };
        }


package main;

Some::Module->import();

quux();

This is the same thing that the Exporter module does to take definitions in one package and put them into another. It’s only slightly more complicated than this because Exporter figures out who’s calling it and does some work to look in @EXPORT and @EXPORT_OK. Other than that, it’s a bunch of monkey programming around an assignment to a typeglob.

Symbolic References

In the previous section, I replaced the definition of a valid subroutine name with an anonymous subroutine. I fiddled with the symbol table to make things happen. Now, I’m going to move from fiddling to abuse.

A symbolic reference, or reference to the symbol table, uses a string to choose the name of the variable and what looks like a dereference to access it:

my $name = 'foo';
my $value_in_foo = ${ $name }; # $foo

This normally isn’t a good idea, so much so that strict prohibits it. Adding use strict to my example, I get a fatal error:

use strict;
my $name = 'foo';
my $value_in_foo = ${ $name }; # $foo

It’s the refs portion of strict that causes the problem:

Can't use string ("foo") as a SCALAR ref while "strict refs" in use at program.pl line 3.

I can get around that by turning off the refs portion temporarily:

use strict;

{
no strict 'refs';

my $name = 'foo';
my $value_in_foo = ${ $name }; # $foo
}

I could also just not turn on the refs portion of strict, but it’s better to turn it off only when I need it and let Perl catch unintended uses:

use strict qw(subs vars); # no 'refs'

For dynamic subroutine tricks, I want to store the subroutine name in a variable, and then turn it into a subroutine.

First, I put the name foo into the scalar $good_name. I then dereference it as a typeglob reference so I can assign my anonymous subroutine to it. Since $good_name isn’t a reference, Perl uses it’s value as a symbolic reference. The value becomes the name of the typeglob Perl should look at and affect. When I assign my anonymous subroutine to *{ $good_name }, I’m creating an entry in the symbol table for the current package for a subroutine named &foo. It also works with the full package specification so I can create &Some::Module::foo, too:

#!/usr/bin/perl
use strict;

{
no strict 'refs';

my $good_name = "foo";
*{ $good_name } = sub { print "Hi, how are you?\n" };

my $remote_name = "Some::Module::foo";
*{ $remote_name } = sub { print "Hi, are you from Maine?\n" };
}

foo();  # no problem
Some::Module::foo();  # no problem

I can be even more abusive, though, and this is something that I shouldn’t ever do, at least not in any code that does something useful or important. Save this for an Obfuscated Perl Contest.

By putting the name in a variable I can get around Perl’s variable naming convention. Normally, I have to start a variable name with a letter or an underscore and follow it with letters, underscores, or digits. Now I get around all that to create the subroutine with the name <=> by using a symbolic reference:

{
no strict 'refs';
my $evil_name = "<=>";
*{ $evil_name } = sub { print "How did you ever call me?\n" };

# <=>()  yeah, that's not gonna happen

*{ $evil_name }{CODE}->();

    &{$evil_name}();    # Another way ;-)
    }

I still can’t use my illegal subroutine in the normal way, so I have to look in its typeglob or use another symbolic reference.

Iterating Through Subroutine Lists

In my Data::Constraint module, I needed to provide a way to validate a value in such a way that the user could build up complex requirements easily and without writing code. The validation would be a matter of configuration, not programming.

Instead of applying a validation routine to a set of values, I turned it around to apply a list of subroutines to a value. Each particular value would have its own combination of validation routines, and I’d validate each value separately (although probably still in some sort of loop). Each subroutine is a constraint on the value.

I start by defining some subroutines to check a value. I don’t know ahead of time what the values will represent or which constraints the user will place on it. I’ll make some general subroutines that the programmer can combine in any way she likes. Each subroutine returns true or false:

my %Constraints = (
        is_defined     => sub { defined $_[0] },
        not_empty      => sub { length $_[0] > 0 },
        is_long        => sub { length $_[0] > 8 },
        has_whitespace => sub { $_[0] =~ m/\s/ },
        no_whitespace  => sub { $_[0] =~ m/\s/ },
        has_digit      => sub { $_[0] =~ m/\d/ },
        only_digits    => sub { $_[0] !~ m/\D/ },
        has_special    => sub { $_[0] =~ m/[^a-z0-9]/ },
        );

The %Constraints hash now serves as a library of validation routines that I can use. Once defined, I figure out how I want to use them.

For example, I want to write a password checker that looks for at least eight characters, no whitespace, at least one digit, and at least one special character. Since I’ve stored the subroutines in a hash, I just pull out the ones I need and pass the candidate password to each one:

chomp( my $password = <STDIN> );
my $fails = grep {
        ! $Constraints{ $_ }->( $password )
        } qw( is_long no_whitespace has_digit has_special );

I use grep in scalar context so it returns the number of items for which its block returns true. Since I really want the number of items that return false, I negate the return value of the subroutine call to make false turn into true, and vice versa. If $fails is anything but zero, I know that something didn’t pass.

The benefit comes when I want to apply this to many different values, each of which might have their own constraints. The technique is the same, but I have to generalize it a bit more:

my $fails = grep {
        ! $Constraints{ $_ }->( $input{$key} )
        } @constraint_names;

From there parameter checking is simply configuration:

password      is_long no_whitespace has_digit has_special
employee_id   not_empty only_digits
last_name     not_empty

I specify that configuration however I like and load it into my program. It is especially useful for nonprogrammers who need to change the behavior of the application. They don’t need to touch any code. If I store that in a file, I read in the lines and build a data structure to hold the names and the constraints that go with them. Once I have that set up, I access everything in the right way to do the same thing I did in the previous example:

while( <CONFIG> )
        {
        chomp;
        my( $key, @constraints ) = split;

        $Config{$key} = \@constraints;
        }

my %input = get_input(); # pretend that does something

foreach my $key ( keys %input )
        {
        my $failed = grep {
                ! $Constraints{ $_ }->( $input{$key} )
                } @{ $Config{$key} };

        push @failed, $key if $failed;
        }

print "These values failed: @failed\n";

My code to check them is small and constant no matter how many input parameters I have or the particular requirements for each of them.

This is the basic idea behind Data::Constraint, although it does more work to set up the situation and return a list of the constraints the value did not meet. I could change this up a little to return a list of the constraints that failed:

my @failed = grep {
        $Constraints{ $_ }->( $value ) ? () : $_
        } @constraint_names;

Processing Pipelines

Much in the same way that I went through a list of constraints in the previous example, I might want to build a processing pipeline. I do the same thing: decide which subroutines to include and then iterate through that list, applying in turn each subroutine to the value.

I can normalize a value by deciding which transformations I should perform. I store all of the transformations as subroutines in %Transformations and then list the ones I want to use in @process. After that, I read in lines on input and apply each subroutine to the line:

#!/usr/bin/perl
# sub-pipeline.pl
my %Transformations = (
        lowercase           => sub { $_[0] = lc $_[0] },
        uppercase           => sub { $_[0] = uc $_[0] },
        trim                => sub { $_[0] =~ s/^\s+|\s+$//g },
        collapse_whitespace => sub { $_[0] =~ s/\s+/ /g },
        remove_specials     => sub { $_[0] =~ s/[^a-z0-9\s]//ig },
        );

my @process = qw( remove_specials lowercase collapse_whitespace trim );

while( <STDIN> )
        {
        foreach my $step ( @process )
                {
                $Transformations{ $step }->( $_ );
                print "Processed value is now [$_]\n";
                }
        }

I might even combine this sort of thing with the constraint checking I did in the previous section. I’ll clean up the value before I check its validity. The input and processing code is very short and should stay that way. The complexity is outside of the flow of the data.

Method Lists

This section isn’t really like the previous two, but I always think of it when I talk about these techniques. As we told you in Intermediate Perl, I can use a scalar variable in the place of a method name as long as the value is a simple scalar (so, no references or other oddities). This works just fine as long as the object can respond to the foo method:

my $method_name = 'foo';
$object->$method_name;

If I want to run a chain of methods on an object, I can just go through the list of method names like I did for the anonymous subroutines. It’s not really the same thing to Perl, but for the programmer it’s the same sort of thinking. I go through the method names using map to get all of the values that I want:

my $isbn = Business::ISBN->new( '0596101058' );

my( $country, $publisher, $item ) =
        map { $isbn->$_ }
        qw( country_code publisher_code article_code );

I don’t have parallel code where I have to type the same thing many times. Again, the code to extract the values I need is very short and the complexity of choosing and listing the methods I need happens away from the important parts of the code flow.

Subroutines As Arguments

Because subroutine references are scalars, I can pass them as arguments to other subroutines:

my $nameless_sub = sub { ... };
foo( $nameless_sub );

But I don’t want to pass these things as scalars; I want to do the fancy things that sort, map, and grep do by using inline blocks:

my @odd_numbers = grep { $_ % 2 } 0 .. 100;

my @squares     = map  { $_ * $_ } 0 .. 100;

my @sorted      = sort { $a <=> $b } qw( 1 5 2 0 4 7 );

To work this little bit of magic, I need to use Perl’s subroutine prototypes. Someone may have told you that prototypes are as useless as they are evil, but in this case I need them to tell Perl that the naked block of code represents a subroutine.

As an example, I want to write something that reduces a list to a single value according to the block of code that I give it. Graham Barr does this in List::Util with the reduce function, which takes a list and turns it into a single value according to the subroutine I give it. This snippet turns a list of numbers into its sum:

use List::Util;
my $sum = reduce { $a + $b } @list;

The reduce function is a well-known method to process a list and you’ll see it in many other languages. To seed the operation, it takes the first two arguments off of the list and computes the result according to the inline subroutine. After that, it takes the result and the next element of the list and repeats the computation, doing that until it has gone through all of the elements of the list.

As with map, grep, and sort, I don’t put a comma after the inline subroutine argument to reduce. To get this to work, though, I need to use Perl’s subroutine prototypes to tell the subroutine to expect an inline subroutine.

The List::Util module implements its functions in XS to make them really speedy, but in case I can’t load the XS stuff for some reason, Graham has a pure Perl backup:

package List::Util;

sub reduce (&@) {
  my $code = shift;
  no strict 'refs';

  return shift unless @_ > 1;

  use vars qw($a $b);

  my $caller = caller;
  local(*{$caller."::a"}) = \my $a;
  local(*{$caller."::b"}) = \my $b;

  $a = shift;
  foreach (@_) {
        $b = $_;
        $a = &{$code}();
  }

  $a;
}

In his prototype, Graham specifies (&@). The & tells Perl that the first argument is a subroutine, and the @ says the rest is a list. The perlsub documentation has the list of prototype symbols and their meanings, but this is all I need here.

The rest of reduce works like sort by putting two elements into the package variables $a and $b. Graham defines the lexical variables with those names, and immediately assigns to the typeglobs for $a and $b in the calling package by using symbolic references. After that the values of $a and $b are the lexical versions. When he calls the subroutine argument &{$code}(), that code looks at its package variables, which are the ones in effect when I wrote the subroutine. Got that? Inside reduce, I’m using the lexical versions, but inside $code, I’m using the package versions from the calling package. That’s why Graham made them aliases of each other.

I can get rid of the $a and $b global variables, too. To do that, I can use @_ instead:

my $count = reduce { $_[0] + $_[1] } @list;

Since @_ is one of Perl’s special variables that always live in the main:: package, I don’t have to worry about the calling package. I also don’t have to worry about putting the list elements in variables. I can play with @_ directly. I call the anonymous subroutine with the first two elements in @_ and put the result back into @_. I keep doing that until @_ has only one element, which I finally return:

sub reduce(&@)
        {
        my $sub = shift;

        while( @_ > 1 )
                {
                unshift @_, $sub->( shift, shift );
                }

        return $_[0];
        }

So far this has only worked with flat lists. What if I wanted to do a similar thing with a complex data structure? In my Object::Iterate module, I created versions of map and grep that I can use with arbitrary data structures in objects. I call my versions imap and igrep:[37]

use Object:Iterate;

my @filtered    = igrep {...} $object;

my @transformed = imap  {...} $object;

I use the same prototype magic I used before, although this time the second argument is a scalar because I’m working with an object instead of a list. I use the prototype, (&$):

sub igrep (&$)
        {
        my $sub    = shift;
        my $object = shift;

        $object->_check_object;

        my @output = ();

        while( $object->__more__ )
                {
                local $_ = $object->__next__;

                push @output, $_ if $sub->();
                }

        $object->__final__ if $object->can( __final__ );

        wantarray ? @output : scalar @output;
        }

sub _check_object
        {
        croak( "iterate object has no __next__ method" )
                unless eval { $_[0]->can( '__next__' ) };
        croak( "iterate object has no __more__ method" )
                unless eval { $_[0]->can( '__more__' ) };

        $_[0]->__init__ if eval { $_[0]->isa( '__init__' ) };
        
        return 1;
        }

In igrep, I put the inline subroutine argument into $sub and the object argument into $object. Object::Iterate works by relying on the object to provide methods to get the next elements for the iteration. I ensure that the object can respond to those methods by calling _check_object, which returns true if the object has the right methods.

The __more__ method lets igrep know if there are any more elements to process. If there are more elements to process, igrep uses the __next__ method to get the next element from the object. No matter what I’ve done to store the data in my object, igrep doesn’t worry about it because it makes the object figure it out.

Once I have an element, I assign it to $_, just like the normal versions of map and grep do. Inside my inline, I use $_ as the current element.

Here’s a short example using my Netscape::Bookmarks module. I want to walk through its tree of categories and links to check all of the links. Once I get my $bookmarks object, I use it with igrep. Inside the inline subroutine, I use the check_link function from my HTTP::SimpleLinkChecker module to get the HTTP status of the link. If it’s 200, the link is okay, but since I want the bad links, I igrep for the ones that aren’t 200. Finally, I print the number of bad links along with the list of links:

#!/usr/bin/perl
# bookmark-checker.pl

use HTTP::SimpleLinkChecker qw(check_link);
use Netscape::Bookmarks;
use Object::Iterate qw(igrep);

my $bookmarks = Netscape::Bookmarks->new( $ARGV[0] );
die "Did not get Bookmarks object!" unless ref $bookmarks;

my @bad_links = igrep {
        200 != check_link($_);
        } $bookmarks;

{
local $/ = "\n\t";
print "There are " . @bad_links . " bad links$/@bad_links\n";
}

The magic happens later in the program where I defined the special methods to work with Object::Iterate. I create a scope where I can define some methods in Netscape::Bookmarks::Category and provide a scope for the lexical variable @links. My __more__ method simply returns the number of elements in @links, and __next__ returns the first element in @links. I could have been more fancy to have __next__ walk through the data structure instead of using __init__ to get them all at once, but that would take a lot more room on the page. No matter what I decide to do, I just have to follow the interface for Object::Iterate:

{
package Netscape::Bookmarks::Category;
my @links = ();

sub __more__ { scalar @links }
sub __next__ { shift  @links }

sub __init__
        {
        my $self = shift;

        my @categories = ( $self );

        while( my $category = shift @categories )
                {
                push @categories, $category->categories;
                push @links, map { $_->href } $category->links;
                }

        print "There are " . @links . " links\n";
        }
}

Autoloaded Methods

When Perl can’t find a method on a module or anywhere in its inheritance tree, it goes back to the original class and looks for the special subroutine AUTOLOAD. As a catchall, Perl sets the package variable $AUTOLOAD to the name of the method for which it was looking and passes AUTOLOAD the same parameter list. After that, it’s up to me what I want to do.

To define a method based on AUTOLOAD, I first have to figure out what the method name should be. Perl puts the full package specification in $AUTOLOAD, and I usually only need the last part, which I can extract with a regular expression:

if( $AUTOLOAD =~ m/::(\w+)$/ )
        {
        # stuff with $1
        }

In some code, you’ll also see this as a substitution that discards everything but the method name. This has the disadvantage of destroying the original value of $AUTOLOAD, which I might want later:

$AUTOLOAD =~ s/.*:://;  # destructive, not preferred

Once I have the method name, I can do anything I like. Since I can assign to typeglobs to define a named subroutine (as I promised in Chapter 8), I might as well do that. I use $AUTOLOAD, which has its original with the full package specification still, as a symbolic reference. Since $AUTOLOAD is not a reference, Perl interprets its typeglob dereference to mean that it should define the variable with that name, access the typeglob, and make the assignment:

*{$AUTOLOAD} = sub { ... };

If $AUTOLOAD is Foo::bar, this turns into:

*{'Foo::bar'} = sub { ... };

That one line sets the right package, defines the subroutine name without defining the code that goes with it, and finally assigns the anonymous subroutine. If I were to code that myself ahead of time, my code would look like this:

{
package Foo;

sub bar;

*bar = sub { ... }
}

Once I’ve defined the subroutine, I want to run it with the original arguments I tried to pass to the method name. However, I want to make it look as if AUTOLOAD had nothing to do with it, and I don’t want AUTOLOAD to be in the call stack. This is one of the few places where I should use a goto. This replaces AUTOLOAD in the subroutine stack and runs the new subroutine I’ve just defined. By using an ampersand in front of the name and nothing on the other side, Perl uses the current @_ for the argument list of my subroutine call:[38]

goto &{$AUTOLOAD};

In Chapter 14 of Intermediate Perl, we use AUTOLOAD to define subroutines on the fly. We look in $AUTOLOAD. If the method name is the same as something in @elements, we create an anonymous subroutine to return the value for the hash element with that key. We assign that anonymous subroutine to the typeglob with that name. That’s a symbolic reference so we wrap a naked block around it to limit the scope of our no strict 'refs'. Finally, once we’ve made the typeglob assignment we use goto to redispatch the method call to the subroutine we just defined. In effect, it’s as if the subroutine definition was always there and the next time I call that method Perl doesn’t have to look for it:

sub AUTOLOAD {
        my @elements = qw(color age weight height);

        our $AUTOLOAD;

        if ($AUTOLOAD =~ /::(\w+)$/ and grep $1 eq $_, @elements) {
                my $field = ucfirst $1;
                {
                no strict 'refs';
                *{$AUTOLOAD} = sub { $_[0]->{$field} };
                }
                goto &{$AUTOLOAD};
                }

        if ($AUTOLOAD =~ /::set_(\w+)$/ and grep $1 eq $_, @elements) {
                my $field = ucfirst $1;
                {
                no strict 'refs';
                *{$AUTOLOAD} = sub { $_[0]->{$field} = $_[1] };
                }
                goto &{$AUTOLOAD};
                }

        die "$_[0] does not understand $method\n";
        }

Hashes As Objects

One of my favorite uses of AUTOLOAD comes from the Hash::AsObject module by Paul Hoffman. He does some fancy magic in his AUTOLOAD routine so I access a hash’s values with its keys, as I normally would, or as an object with methods named for the keys:

use Hash::AsObject;

my $hash = Hash::AsObject->new;

$hash->{foo} = 42;   # normal access to a hash reference

print $hash->foo, "\n"; # as an object;

$hash->bar( 137 ),      # set a value;

It can even handle multilevel hashes:

$hash->{baz}{quux} = 149;

$hash->baz->quux;

The trick is that $hash is really just a normal hash reference that’s blessed into a package. When I call a method on that blessed reference, it doesn’t exist so Perl ends up in Hash::AsObject::AUTOLOAD. Since it’s a pretty involved bit of code to handle lots of special cases, I won’t show it here, but it does basically the same thing I did in the previous section by defining subroutines on the fly.

AutoSplit

Autosplitting is another variation on the AUTOLOAD technique, but I haven’t seen it used as much as it used to be. Instead of defining subroutines dynamically, AutoSplit takes a module and parses its subroutine definitions and stores each subroutine in its own file. It loads a subroutine’s file only when I call that subroutine. In a complicated API with hundreds of subroutines I don’t have to make Perl compile every subroutine when I might just want to use a couple of them. Once I load the subroutine, Perl does not have to compile it again in the same program. Basically, I defer compilation until I need it.

To use AutoSplit, I place my subroutine definitions after the __END__ token so Perl does not parse or compile them. I tell AutoSplit to take those definitions and separate them into files:

$ perl -e 'use AutoSplit; autosplit( "MyModule.pm", "auto_dir", 0, 1, 1 );

I usually don’t need to split a file myself, though, since ExtUtils::MakeMaker takes care out that for me in the build process. After the module is split, I’ll find the results in one of the auto directories in the Perl library path. Each of the .al files holds a single subroutine definition:

ls ./site_perl/5.8.4/auto/Text/CSV
_bite.al        combine.al      fields.al       parse.al        string.al
autosplit.ix    error_input.al  new.al          status.al       version.al

To load the method definitions when I need them, I use the AUTOLOAD method provided by AutoLoader and typically use it as a typeglob assignment. It knows how to find the right file, load it, parse and compile it, and then define the subroutine:

use AutoLoader;
*AUTOLOAD = \&AutoLoader::AUTOLOAD;

You may have already run into AutoSplit at work. If you’ve ever seen an error message like this, you’ve witnessed AutoLoader looking for the missing method in a file. It doesn’t find the file, so it reports that it can’t locate the file. The Text::CSV module uses AutoLoader, so when I load the module and call an undefined method on the object, I get the error:

$ perl -MText::CSV -e '$q = Text::CSV->new; $q->foobar'
Can't locate auto/Text/CSV/foobar.al in @INC ( ... ).

This sort of error almost always means that I’m using a method name that isn’t part of the interface.

Summary

I can use subroutine references to represent behavior as data, and I can use the references like any other scalar.

Further Reading

The documentation for prototypes is in the perlsub documentation.

Mark Jason Dominus also used the function names imap and igrep to do the same thing I did, although his discussion of iterators in Higher-Order Perl is much more extensive. See http://hop.perl.plover.com/. I talk about my version in “The Iterator Design Pattern” in The Perl Review 0.5 (September 2002), which you can get for free online: http://www.theperlreview.com/Issues/The_Perl_Review_0_5.pdf. Mark Jason’s book covers functional programming in Perl by composing new functions out of existing ones, so it’s entirely devoted to fancy subroutine magic.

Randy Ray writes about autosplitting modules in The Perl Journal number 6. For the longest time it seemed that this was my favorite article on Perl and the one that I’ve read the most times.

Nathan Torkington’s “CryptoContext” appears in The Perl Journal number 9 and the compilation The Best of The Perl Journal: Computer Science & Perl Programming.



[36] Normally, exiting a subroutine by using next, last, or redo is a not a good thing. That doesn’t mean it’s a bad thing, but it’s odd enough to have its own warning in perldiag.

[37] I think Mark Jason Dominus used these names before I did, but I don’t think I was reading his Higher-Order Perl mailing list when I came up with the names. In a footnote to my “Iterator Design Pattern” article in The Perl Review 0.5, I seem to think it was a coincidence. We were both thinking about iterators at that point, although I was thinking about how cool design patterns are and he was thinking how stupid they are. We were probably both right.

[38] Nathan Torkington talks about this in “CryptoContext” in The Perl Journal number 9.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required