Although there are over 10,000 distributions in CPAN, sometimes it doesn’t have exactly what I need. Sometimes a module has a bug or needs a new feature. I have several options for fixing things, whether or not the module’s author accepts my changes. The trick is to leave the module source the same but still fix the problem.
I can do several things to fix a module, and no solution is the right answer for every situation. I like to go with the solutions that mean the least amount of work for me and the most benefit for the Perl community, although those aren’t always compatible. For the rest of this section, I won’t give you a straight answer. All I can do is point out some of the issues involved so you can figure out what’s best for your situation.
The least amount of work in most cases is to fix anything I need and send a patch to the author so that he can incorporate them in the next release of the module. There’s even a bug tracker for every CPAN module[39]and the module author automatically gets an email notifying him about the issue.
When I’ve made my fix I get the diffs,
which is just the parts of the file that have changed. The diff
command creates
the patch:
$ diff -u original_file updated_file > original_file.diff
The patch shows which changes someone needs to make to the original version to get my new version:
% diff -u -d ISBN.pm.dist ISBN.pm --- ISBN.pm.dist 2007-02-05 00:26:27.000000000 -0500 +++ ISBN.pm 2007-02-05 00:27:57.000000000 -0500 @@ -59,8 +59,8 @@ $self->{'isbn'} = $common_data; if($isbn13) { - $self->{'positions'} = [12]; - ${$self->{'positions'}}[3] = 3; + $self->{'positions'} = [12]; + $self->{'positions'}[3] = 3; } else { $self->{'positions'} = [9]; }
The author can take the diff and apply it to his source using the
patch
[40]program, which can read the diff to figure out the file and
what it needs to do to update it:
$ patch < original_file.diff
Sometimes the author is available, has time to work on the module, and releases a new distribution. In that case, I’m done. On the other hand, CPAN is mostly the result of a lot of volunteer work, so the author may not have enough free time to commit to something that won’t pay his rent or put food in his mouth. Even the most conscientious module maintainer gets busy sometimes.
To be fair, even the seemingly simplest fixes aren’t trivial matters to all module maintainers. Patches hardly ever come with corresponding updates to the tests or documentation, and the patches might have consequences to other parts of the modules or to portability. Furthermore, patch submitters tend to change the interface in ways that work for them but somehow make the rest of the interface inconsistent. Things that seem like five minutes to the submitter might seem like a couple of hours to the maintainer, so make it onto the “To-Do” list rather than the “Done” list.
If I can’t get the attention of the module maintainer, I might just make changes to the sources myself. Doing it this way usually seems like it works for a while, but when I update modules from CPAN, my changes might disappear as a new version of the module overwrites my changes. I can partially solve that by making the module version very high, hoping an authentic version isn’t greater than the one I choose:
our $VERSION = 99999;
This has the disadvantage of making my job tough if I want to
install an official version of the distribution that the maintainer has
fixed. That version will most likely have a smaller number so tools such
as CPAN.pm
and CPANPLUS
will think my patched version is
up-to-date and won’t install the seemingly older, but actually newer,
version over it.
Other people who want to use my software might have the same problems, but they won’t realize what’s going on when things break after they update seemingly unrelated modules. Some software vendors get around this by creating a module directory about which only their application knows and putting all the approved versions of modules, including their patched versions, in that directory. That’s more work than I want, personally, but it does work.
If the module is important to you (or your business) and the author has disappeared, you might consider officially taking over its maintenance. Although every module on CPAN has an owner, the admins of the Perl Authors Upload Server (PAUSE)[41]can make you a comaintainer or even transfer complete ownership of the module to you.
The process is simple, although not automated. First, send a message to modules@perl.org inquiring about the module status. Often, an administrator can reach the author when you cannot because the author recognizes the name. Second, the admins will tell you to publicly announce your intent to take over the module, which really means to announce it where most of the community will see it. Next, just wait. This sort of thing doesn’t happen quickly because the administrators give the author plenty of time to respond. They don’t want to transfer a module while an author’s on holiday!
Once you take over the module, though, you’ve taken over the module. You’ll probably find that the grass isn’t greener on the other side and at least empathize with the plight of the maintainers of free software, starting the cycle once again.
The last resort is forking, or creating a parallel distribution next to the official one. This is a danger of any popular open source projects, but it’s been only on very rare occasions that this has happened with a Perl module. PAUSE will allow me to upload a module with a name registered to another author. The module will show up on CPAN but PAUSE will not index it. Since it’s not in the index, the tools that work with CPAN won’t see it even though CPAN stores it.
I don’t have to use the same module name as the original. If I choose a different name, I can upload my fixed module, PAUSE will index it under its new name, and the CPAN tools can install it automatically. Nobody knows about my module because everybody uses the original version with the name they already know about and the interface they already use. It might help if my new interface is compatible with the original module or at least provides some sort of compatibility layer.
I might just decide to not use a third-party module at all. If I write the module myself I can always find the maintainer. Of course, now that I’m the creator and the maintainer, I’ll probably be the person about whom everyone else complains. Doing it myself means I have to do it myself. That doesn’t quite fit my goal of doing the least amount of work. Only in very rare cases do these replacement modules catch on, and I should consider that before I do a lot of work.
I had to debug a problem with a program that used Email::Stuff
to send email through
Gmail. Just like other mail servers, the program was supposed to connect
to the mail server and send its mail, but it was hanging on the local
side. It’s a long chain of calls, starting at
Email::Stuff
and then going through Email::Simple
,
Email::Send::SMTP
, Net::SMTP::SSL
,
Net::SMTP
, and ending up in IO::Socket::INET
. Somewhere
in there something wasn’t happening right. This problem, by the way,
prompted my Carp
modifications in Chapter 4, so I could see a full dump of the arguments at each
level.
I finally tracked it down to something going on in
Net::SMTP
. For some reason, the local port and address,
which should have been selected automatically, weren’t. Here’s an extract
of the real new
method from Net::SMTP
:
package Net::SMTP; sub new { my $self = shift; my $type = ref($self) || $self; ... my $h; foreach $h (@{ref($hosts) ? $hosts : [ $hosts ]}) { $obj = $type->SUPER::new(PeerAddr => ($host = $h), PeerPort => $arg{Port} || 'smtp(25)', LocalAddr => $arg{LocalAddr}, LocalPort => $arg{LocalPort}, Proto => 'tcp', Timeout => defined $arg{Timeout} ? $arg{Timeout} : 120 ) and last; } ... $obj; }
The typical call to new
passes
the remote hostname as the first argument and then a series of pairs after
that. Since I don’t want the standard SMTP port for Google’s service I
specify it myself:
my $mailer = Net::SMTP->new( 'smtp.gmail.com', Port => 465, ... );
The problem comes in when I don’t specify a LocalAddr
or LocalPort
argument. I shouldn’t have to do that,
and the lower levels should find an available port for the default local
address. For some reason, these lines were causing problems when they
didn’t get a number. They don’t work if they are undef
, which should convert to 0
when used as a number, and should tell the
lower levels to choose appropriate values on their own:
LocalAddr => $arg{LocalAddr}, LocalPort => $arg{LocalPort},
To investigate the problem, I want to change
Net::SMTP
, but I don’t want to edit Net/SMTP.pm directly. I get nervous when
editing standard modules. Instead of editing it, I’ll surgically replace
part of the module. I want to handle the case of the implicit LocalAddr
and LocalPort
values but also retain the ability to
explicitly choose them. I’ve excerpted the full solution to show the
relevant parts:
BEGIN { use Net::SMTP; no warnings 'redefine'; *Net::SMTP::new = sub { print "In my Net::SMTP::new...\n"; package Net::SMTP; # ... snip my $hosts = defined $host ? $host : $NetConfig{smtp_hosts}; my $obj; my $h; foreach $h (@{ref($hosts) ? $hosts : [ $hosts ]}) { $obj = $type->SUPER::new(PeerAddr => ($host = $h), PeerPort => $arg{Port} || 'smtp(25)', $arg{LocalAddr} ? ( LocalAddr => $arg{LocalAddr} ) : (), $arg{LocalPort} ? ( LocalPort => $arg{LocalPort} ) : (), Proto => 'tcp', Timeout => defined $arg{Timeout} ? $arg{Timeout} : 120 ); last if $obj; } # ... snip $obj; }
To make everything work out, I have to do a few things. First I wrap
the entire thing in a BEGIN
block so this code runs before
anyone really has a chance to use anything from
Net::SMTP
. Inside the BEGIN
, I immediately load
Net::SMTP
so anything it defines is already in place; I
wouldn’t want Perl to replace all of my hard work by loading the original
code on top of it.[42]Immediately after I load Net::SMTP
, I tell
Perl not to warn me about what I’m going to do next. That’s a little clue
that I shouldn’t do this lightly, but not enough to stop me.
Once I have everything in place, I redefine Net::SMTP::new()
by assigning to the typeglob
for that name. The big change is inside the foreach
loop. If the argument list didn’t have
true values for LocalAddr
and LocalPort
, I don’t include them in the argument
list to the SUPER
class:
$arg{LocalAddr} ? ( LocalAddr => $arg{LocalAddr} ) : (), $arg{LocalPort} ? ( LocalPort => $arg{LocalPort} ) : (),
That’s a nifty trick. If $arg{LocalAddr}
has a true value, it selects the
first option in the ternary operator, so I include LocalAddr => $arg{LocalAddr}
in the argument
list. If $arg{LocalAddr}
doesn’t have a
true value, I get the second option of the ternary operator, which is just
the empty list. In that case, the lower levels choose appropriate values
on their own.
Now I have my fix to my Net::SMTP
problem, but I haven’t changed the original file. Even if I don’t want to
use my trick in production, it’s extremely effective for figuring out
what’s going on. I can change the offending module and instantly discard
my changes to get back to the original. It also serves as an example I can
send to the module author when I report my problem.
The best solution, if possible, is a subclass that inherits from the module I need to alter. My changes live in their own source files, and I don’t have to touch the source of the original module. We mostly covered this in our barnyard example in Intermediate Perl, so I won’t go over it again here.[43]
Before I do too much work, I create an empty subclass. I’m not going
to do a lot of work if I can’t even get it working when I haven’t changed
anything yet. For this example, I want to subclass the
Foo
module so I can add a new feature. I can use the
Local
namespace, which should never conflict with a
real module name. My Local::Foo
module
inherits from the module I want to fix, Foo
, using the base
pragma:
package Local::Foo use base qw(Foo); 1;
If I’m going to be able to subclass this module, I should be able to simply change the class name I use and everything should still work. In my program, I use the same methods from the original class, and since I didn’t actually override anything, I should get the exact same behavior as the original module. This is sometimes called the “empty” or “null subclass test”:
#!/usr/bin/perl # use Foo use Local::Foo; #my $object = Foo->new(); my $object = Local::Foo->new( ... );
The next part depends on what I want to do. Am I going to completely replace a feature or method, or do I just want to add a little bit to it? I add a method to my subclass. I probably want to call the super method first to let the original method do its work:
package Local::Foo use base qw(Foo); sub new { my( $class, @args ) = @_; ... munge arguments here my $self = $class->SUPER::new( @_ ); ... do my new stuff here. } 1;
Sometimes this won’t work, though, because the original module can’t
be subclassed, either by design or accident. For instance, the
unsuspecting module author might have used the one-argument form of
bless
. Without the second argument,
bless
uses the current package for the
object type. No matter what I do in the subclass, the one-argument
bless
will return an object that ignores the
subclass:
sub new { my( $class, @args ) = @_; my $self = { ... }; bless $self; }
To make this subclassable, I need to use the first argument to
new
, which I stored in $class
, as the second argument to bless
:
sub new { my( $class, @args ) = @_; my $self = { ... }; bless $self, $class; }
The value in $class
is the
original class name that I used, not the current package. Unless I have a
good reason to ignore the original class name, I should always use it with
bless
.
In testing this, there are two things I want to check. First, I need
to ensure that inheritance works. That means that somewhere in the
inheritance tree I find the parent class, Foo
, as well as the class I used to create the
object, Local::Foo
:
# some file in t/ use Test::More; my $object = Local::Foo->new(); foreach my $isa_class ( qw( Foo Local::Foo ) ) { isa_ok( $object, $isa_class, "Inherits from $isa_class" ); }
Normally, that should be enough. If I need the object to belong in a
particular class rather than merely inherit from it, I can check the exact
class using ref
:
is( ref $object, 'Local::Foo', 'Object is type Local::Foo' );
The ref
built-in isn’t as good as
the blessed
function from the
Scalar::Util
module that comes with Perl since 5.8. It
does the same thing but returns undef
if its argument isn’t blessed. That avoids the case of ref
returning true for an unblessed
reference:
use Scalar::Util qw(blessed); is( blessed $object, 'Local::Foo', 'Object is type Local::Foo' );
Once I’m satisfied that I can make the subclass, I start to override methods in the subclass to get my desired behavior.
Sometimes module authors know that their module won’t meet everyone’s needs and they provide a way to get around the default behavior.
ExtUtils::MakeMaker
works for most module installers but if it doesn’t do
something that I need I can easily change it through subclassing. To do
this ExtUtils::MakeMaker
uses the special subclass
name My
. Before it calls its hardcoded methods, it
looks for the same method names in the package My
and will use those preferentially.
As MakeMaker
performs its
magic, it writes to the file Makefile according
to what its methods tell it to do. What it decides to write comes from
ExtUtils::MM_Any
, the base class for the magic and
then perhaps a subclass, such as ExtUtils::MM_Unix
or
ExtUtils::MM_Win32
, that might override methods for
platform-specific issues.
In my Test::Manifest
module I want to change
how testing works. I want the make
test
step to execute the test files in the order I specify
rather than the order in which glob
returns the filenames from the t
directory. The function test_via_harness
writes out a section of the
Makefile. I know this because I
look in the Makefile to find which
bits do the part I want to change and then look for that text in the
module to find the right function:
package ExtUtils::MakeMaker; sub test_via_harness { my($self, $perl, $tests) = @_; return qq{\t$perl "-MExtUtils::Command::MM" }. qq{"-e" "test_harness(\$(TEST_VERBOSE), '\$(INST_LIB)', '\$(INST_ARCHLIB)')" $tests\n}; }
After interpolations and replacements the output in the Makefile shows up as something like this (although results may differ by platform):
test_dynamic :: pure_all PERL_DL_NONLAZY=1 $(FULLPERLRUN) "-MExtUtils::Command::MM" "-e" "test_harness($(TEST_VERBOSE), '$(INST_LIB)', '$(INST_ARCHLIB)')" $(TEST_FILES)
After boiling everything down, a make
test
essentially runs a command that globs all of the files in
the t directory and executes them in that order.
This leads module authors to name their modules odd things like
00.load.t or 99.pod.t to make the order come out how they
like:
perl -MExtUtils::Command::MM -e 'test_harness( ... )' t/*.t
It doesn’t matter much what test_harness
actually does as long as my
replacement does the same thing. In this case, I don’t want the test
files to come from @ARGV
because I
want to control their order.
To change how that works, I need to get my function in the place
of test_harness
. By defining my own
test_via_harness
subroutine in the
package MY
, I can put any text I like
in place of the normal test_via_harness
. I want to use my function
from Test::Manifest
. I use the full package
specification as the subroutine name to put it into the right
namespace:
package Test::Manifest; sub MY::test_via_harness { my($self, $perl, $tests) = @_; return qq|\t$perl "-MTest::Manifest" | . qq|"-e" "run_t_manifest(\$(TEST_VERBOSE), '\$(INST_LIB)', | . qq|'\$(INST_ARCHLIB)', \$(TEST_LEVEL) )"\n|; };
Instead of taking the list of files as arguments, in my run_t_manifest
subroutine I call get_t_files()
, which looks in the file
t/test_manifest. Once run_t_manifest()
has the list of files it
passes it to Test::Harness::runtests()
, the same thing that
the original test_harness()
ultimately calls:
use File::Spec::Functions; my $Manifest = catfile( "t", "test_manifest" ); sub run_t_manifest { ...; my @files = get_t_files( $level ); ...; Test::Harness::runtests( @files ); } sub get_t_files { return unless open my( $fh ), $Manifest; my @tests = (); while( <$fh> ) { ...; push @tests, catfile( "t", $test ) if -e catfile( "t", $test ); } close $fh; return wantarray ? @tests : join " ", @tests; }
In t/test_manifest I list the test files to run, optionally commenting lines I want to skip. I list them in any order I like and that’s the order I’ll run them:
load.t pod.t pod_coverage.t #prereq.t new.t feature.t other_feature.t
By subclassing the module, I don’t have to fool with
ExtUtils::MakeMaker
, which is certainly something I
don’t want to do. I get the feature I want and I don’t break the module
for anyone else. I still have the same
ExtUtils::MakeMaker
source that everyone else has. I
go through the same process if I need to change any other behavior in
ExtUtils::MakeMaker
.
For another example of subclassing, see Chapter 15,
where I subclass Pod::Simple
. Sean Burke wrote the
module specifically for others to subclass. Most of this book started as
pseudopod, a special O’Reilly Media variant of plain ol’ documentation,
and I created my own Pod::PseudoPod
subclasses to
convert the source to HTML pages for the web site[44]and for the final sources for the production
team.
Instead of replacing a subroutine or method, I might just want to wrap it in another subroutine. That way I can inspect and validate the input before I run the subroutine and I can intercept and clean up the return value before I pass it back to the original caller. The basic idea looks like this:
sub wrapped_foo { my @args = @_; ...; # prepare @args for next step; my $result = foo( @args ); ...; # clean up $result return $result; }
To do this right, however, I need to handle the different contexts.
If I call wrapped_foo
in list context,
I need to call foo
in list context,
too. It’s not unusual for Perl subroutines to have contextual behavior and
for Perl programmers to expect it. My basic template changes to handle
scalar, list, and void contexts:
sub wrapped_foo { my @args = @_; ...; # prepare @args for next step; if( wantarray ) # list context { my @result = foo( @args ); return @result; } elsif( defined wantarray ) # scalar context { my $result = foo( @args ); ...; # clean up $result return $result; } else # void context { foo( @args ); } }
It gets a bit more complicated than this, but Damian
Conway makes it easy with Hook::LexWrap
.
He lets me add pre- and posthandlers that run before and
after the wrapped subroutine, and he takes care of all of the details in
the middle. His interface is simple; I use the wrap
subroutine and provide the handlers as
anonymous subroutines. The wrapped version is sub_to_watch()
and I call it as a normal
subroutine:
#!/usr/bin/perl use Hook::LexWrap; wrap 'sub_to_watch', pre => sub { print "The arguments are [@_]\n" }, post => sub { print "Result was [$_[-1]]\n" }; sub_to_watch( @args );
Hook::LexWrap
adds another element to @_
to hold the return value, so in my
posthandler I look in $_[-1]
to see the
result.
I can use this to rewrite my divide
example from Chapter 4. In
that example, I had a subroutine to return the quotient of two numbers. In
my made-up situation, I was passing it the wrong arguments, hence getting
the wrong answer. Here’s my subroutine again:
sub divide { my( $n, $m ) = @_; my $quotient = $n / $m; }
Now I want to inspect the arguments before they go in and see the return value before it comes back. If the actual arguments going in and the quotient match, then the subroutine is doing the right thing, but someone is using the wrong arguments. If the arguments are right but the quotient is wrong, then the subroutine is wrong:
#!/usr/bin/perl use Hook::LexWrap; sub divide { my( $n, $m ) = @_; my $quotient = $n / $m; } wrap 'divide', pre => sub { print "The arguments are [@_]\n" }, post => sub { print "Result was [$_[-1]]\n" }; my $result = divide( 4, 4 );
After I wrap
the subroutine, I
call divide
as I normally would. More
importantly, though, is that I’m not changing my program for calls to
divide because Hook::LexWrap
does some magic behind the
scenes to replace the subroutine definition so my entire program sees the
wrapped version. I’ve changed the subroutine without editing the original
source. Without (apparently) changing the subroutine, whenever I call it I
get a chance to see extra output:
The arguments are [4 4 ] Result was [1]
When I remove the wrap
, I leave
everything just as I found it and don’t have to worry about
reverting my changes.
I don’t have to change module code to change how a module works. For an object-oriented module, I can create a subclass to change the parts I don’t like. If I can’t subclass it for some reason, I can replace parts of it just like I can for any other module. No matter what I do, however, I usually want to leave the original code alone (unless it’s my module and I need to fix it) so I don’t make the problem worse.
The perlboot documentation has an extended subclassing example. It’s also in Intermediate Perl.
I talk about Hook::Lex::Wrap
in “Wrapping
Subroutines to Trace Code Execution,” The Perl
Journal, July 2005: http://www.ddj.com/dept/lightlang/184416218.
The documentation of diff
and
patch
discusses their use. The patch
manpage is particularly instructive
because it contains a section near the end that talks about the pragmatic
considerations of using the tools and dealing with other programmers.
[39] Best Practical provides its RT service for no charge to the Perl community (http://rt.cpan.org).
[40] Larry Wall, the creator of Perl, is also the original author of
patch
. It’s now maintained by the
Free Software Foundation. Most Unix-like systems should already have
patch
, and Windows users can get
it from several sources, including GNU utilities for Win32 (http://unxutils.sourceforge.net/) and the Perl Power
Tools (http://ppt.perl.org).
[41] See http://pause.perl.org. As I write this, I’m one of the many PAUSE administrators, so you’ll probably see me on modules@perl.org. Don’t be shy about asking for help on that list.
[42] I assume that nobody else in this program is performing any
black magic, such as unsetting values in %INC
and reloading modules.
[43] If you don’t have the Alpaca book handy that’s okay. Randal added it to the standard Perl distribution as the perlboot documentation.
[44] The Mastering Perl web site, with book text and source code, is at http://www.pair.com/comdog/mastering_perl.
Get Mastering Perl now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.