Chapter 7. Cleaning Up Perl

Part of mastering Perl is controlling the source code, no matter who gives it to you. People can usually read the code that they wrote, and usually complain about the code that other people wrote. In this chapter I’ll take that code and make it readable. This includes the output of so-called Perl obfuscators, which do much of their work by simply removing whitespace. You’re the programmer and it’s the source, so you need to show it who’s boss.

Good Style

I’m not going to give any advice about code style, where to put the braces, or how many spaces to put where. These things are the sparks for heated debates that really do nothing to help you get work done. The Perl interpreter doesn’t really care, nor does the computer. But, after all, we write code for people first and computers second.

Good code, in my mind, is something that a skilled practitioner can easily read. It’s important to note that good code is not something that just anyone could read. Code isn’t bad just because a novice Perl programmer can’t read it. The first assumption has to be that the audience for any code is people who know the language or, if they don’t, know how to look up the parts they need to learn. Along with that, a good programmer should be able to easily deal with source written in the handful of major coding styles.

After that, consistency is the a major part of good code. Not only should I try to do the same thing in the same way each time (and that might mean everyone on the team doing it in the same way), but I should format it in the same way each time. Of course, there are edge cases and special situations, but for the most part, doing things the same way each time helps the new reader recognize what I’m trying to do.

Lastly, I like a lot of whitespace in my code, even before my eyesight started to get bad. Spaces separate tokens and blank lines separate groups of lines that go together, just as if I were writing prose. This book would certainly be hard to read without paragraph breaks; code has the same problem.

I have my own particular style that I like, but I’m not opposed to using another style. If I edit code or create a patch for somebody else’s code, I try to mimic his style. Remember, consistency is the major factor in good style. Adding my own style to existing code makes it inconsistent.

If you haven’t developed your own style or haven’t had one forced on you, the perlstyle documentation as well as Perl Best Practices by Damian Conway (O’Reilly) can help you set standards for you and your coding team.

perltidy

The perltidy program reformats Perl programs to make them easier to read. Given a mess of code with odd indentation styles (or no indentation at all), little or no whitespace between tokens, and all other manner of obfuscation, perltidy creates something readable.

Here’s a short piece of code that I’ve intentionally written with bad style.[31]I haven’t done anything to obfuscate the program other than remove all the whitespace I could without breaking things:

#!/usr/bin/perl
# yucky
use strict;use warnings;my %Words;while(<>){chomp;s{^\s+}{};s{\s+$}{};
my $line=lc;my @words=split/\s+/,$line;foreach my $word(@words){
$word=~s{\W}{}g;next unless length $word;$Words{$word}++;}}foreach
my $word(sort{$Words{$b}<=>$Words{$a}}keys %Words){last
if $Words{$word}<10;printf"%5d  %s\n",$Words{$word},$word;}

If somebody else handed me this program, could I tell what the program does? I might know what it does, but not how it does it. Certainly I could read it slowly and carefully keep track of things in my head, or I could start to add newlines between statements. That’s work, though, and too much work even for this little program.

I save this program in a file I name yucky and run it through perltidy using its default options. perltidy won’t overwrite my file, but instead creates yucky.tdy with the reformatted code:

$ perltidy yucky

Here’s the result of perltidy’s reformatting, which uses the suggestions from the perlstyle documentation:

#!/usr/bin/perl
# yucky
use strict;
use warnings;
my %Words;
while (<>) {
        chomp;
        s{^\s+}{};
        s{\s+$}{};
        my $line = lc;
        my @words = split /\s+/, $line;
        foreach my $word (@words) {
                $word =~ s{\W}{}g;
                next unless length $word;
                $Words{$word}++;
        }
}
foreach my $word ( sort { $Words{$b} <=> $Words{$a} } keys %Words ) {
        last
          if $Words{$word} < 10;
        printf "%5d  %s\n", $Words{$word}, $word;
}

Maybe I’m partial to the GNU coding style, though, so I want that format instead. I give perltidy the -gnu switch:

$ perltidy -gnu yucky

Now the braces and indentation are a bit different, but it’s still more readable than the original:

#!/usr/bin/perl
# yucky
use strict;
use warnings;
my %Words;
while (<>)
{
        chomp;
        s{^\s+}{};
        s{\s+$}{};
        my $line = lc;
        my @words = split /\s+/, $line;
        foreach my $word (@words)
        {
                $word =~ s{\W}{}g;
                next unless length $word;
                $Words{$word}++;
        }
}
foreach my $word (sort { $Words{$b} <=> $Words{$a} } keys %Words)
{
        last
          if $Words{$word} < 10;
        printf "%5d  %s\n", $Words{$word}, $word;
}

I can get a bit fancier by asking perltidy to format the program as HTML. The -html option doesn’t reformat the program but just adds HTML markup and applies a stylesheet to it. To get the fancy output on the reformatted program, I convert the yucky.tdy to HTML:

$ perltidy yucky
$ perltidy -html yucky.tdy

perltidy can do quite a bit more too. It has options to minutely control the formatting options for personal preference, and many options to send the output from one place to another, including an in-place editing feature.

De-Obfuscation

Some people have the odd notion that they should make their Perl code harder to read. Sometimes they do this because they want to hide secrets, such as code to handle license management, or they don’t want people to distribute the code without their permission. Whatever their reason, they end up doing work that gets them nothing. The people who don’t know how to get the source back aren’t worrisome, and those who do will just be more interested in the challenge.

De-Encoding Hidden Source

Perl code is very easy to reverse engineer since no matter what a code distributor does to the source, Perl still has to be able to run it. If Perl can get to the source, so can I with a little work. If you’re spending your time trying to hide your source from the people you’re giving it to, you’re wasting your time.

A favorite tactic of Perl obfuscators is also the favorite tactic of people who like to win the Obfuscated Perl Contest. That is, the Perl community does for sport what people try to sell you, so the Perl community has a lot of tricks to undo the damage.

I’ll show you the technique working forward first. Once you know the trick, it’s just monkey coding to undo it (annoying, but still tractable). I’ll start with a file japh-plaintext.pl:

#/usr/bin/perl
# japh-plaintext.pl

print "Just another Perl hacker,\n";

I want to take that file and transpose all of the characters so they become some other character. I’ll use ROT-13, which moves all of the letters over 13 places and wraps around the end. A real obfuscator will be more robust and handle special cases such as delimiters, but I don’t need to worry about that. I’m interested in defeating ones that have already done that work. I just read a file from the code line and output an encoded version:

#!/usr/bin/perl
# japh-encoder-rot13.pl

my $source = do {
        local $/; open my($fh),
        $ARGV[0] or die "$!"; <$fh>
        };

$source =~ tr/a-zA-Z/n-za-mN-ZA-M/;

print $source;

What I get out looks like what I imagine might be some extraterrestrial language:

$ perl japh-encoder.pl japh-p*
#/hfe/ova/crey
# wncu-cynvagrkg.cy

cevag "Whfg nabgure Crey unpxre,\a";

I can’t run this program because it’s no longer Perl. I need to add some code at the end that will turn it back into Perl source. That code has to undo the transformation, and then use the string form of eval to execute the decoded string as (the original) code:

#!/usr/bin/perl
# japh-encoder-decoder-rot13.pl

my $source = do {
        local $/; open my($fh),
        $ARGV[0] or die "$!"; <$fh>
        };

$source =~ tr/a-zA-Z/n-za-mN-ZA-M/;

print <<"HERE";
my \$v = q($source);
\$v =~ tr/n-za-mN-ZA-M/a-zA-Z/;
eval \$v;
HERE

Now my encoded program comes with the code to undo the damage. A real obfuscator would also compress whitespace and remove other aids to reading, but my output will do fine for this demonstration:

$ perl japh-encoder-decoder-rot13.pl japh-plaintext.pl
my $v = q(#/hfe/ova/crey
# wncu-cynvagrkg.cy

cevag "Whfg nabgure Crey unpxre,\a";
);
$v =~ tr/n-za-mN-ZA-M/a-zA-Z/;
eval $v;

That’s the basic idea. The output still has to be Perl code, and it’s only a matter of the work involved to encode the source. That might be as trivial as my example or use some sort of secret such as a license key to decrypt it. Some things might even use several transformations. Here’s an encoder that works like ROT-13 except over the entire 8-bit range (so, ROT-255):

#!/usr/bin/perl
# japh-encoder-decoder-rot255.pl

my $source = do {
        local $/; open my($fh),
        $ARGV[0] or die "$!"; <$fh>
        };

$source =~ tr/\000-\377/\200-\377\000-\177/;

print <<"HERE";
my \$v = q($source);
\$v =~ tr/\200-\377\000-\177/\000-\377/;
eval \$v;
HERE

I take the already encoded output from my ROT-13 program and encode it again. The output is mostly goobledygook, and I can’t even see some of it on the screen because some 8-bit characters aren’t printable:

$ perl japh-encoder-decoder-rot13.pl japh-p* |
       perl japh-encoder-decoder-rot255.pl -
my $v = q(íù ¤ö ½ ñ¨£¯èæå¯ïöá¯ãòåùŠ£ ÷îãõ­ãùîöáçòëç®ãùŠŠãåöáç ¢×èæç).
                q(îáâçõòå Ãòåù õîðøòå¬ÜᢻŠ©»Š¤ö ½þ ôò¯î­úá­íέÚÁ­Í¯á­úÁ­Ú¯»).
                q(Šåöáì ¤ö»Š);
$v =~ tr/€-ÿ-/-ÿ/;
eval $v;

Now that I’ve shown you the trick, I’ll work backward. From the last output there, I see the string eval. I’ll just change that to a print:

my $v = q(íù ¤ö ½ ñ¨£¯èæå¯ïöá¯ãòåùŠ£ ÷îãõ­ãùîöáçòëç®ãùŠŠãåöáç ¢×èæç).
                q(îáâçõòå Ãòåù õîðøòå¬ÜᢻŠ©»Š¤ö ½þ ôò¯î­úá­íέÚÁ­Í¯á­úÁ­Ú¯»).
                q(Šåöáì ¤ö»Š);
$v =~ tr/€-ÿ-/-ÿ/;
print $v;

I run that program and get the next layer of encoding:

my $v = q(#/hfe/ova/crey
# wncu-cynvagrkg.cy

cevag "Whfg nabgure Crey unpxre,\a";
);
$v =~ tr/n-za-mN-ZA-M/a-zA-Z/;
eval $v;

I change that eval to a print, and I’m back to the original source:

#/usr/bin/perl
# japh-plaintext.pl

print "Just another Perl hacker,\n";

I’ve now defeated the encoding tactic, but that’s not the only trick out there. I’ll show some more in a moment.

Unparsing Code with B::Deparse

Not all of these techniques are about looking at other people’s code. Sometimes I can’t figure out why Perl is doing something, so I compile it, and then decompile it to see what Perl is thinking. The B::Deparse module takes some code, compiles into Perl’s internal compiled structure, and then works backward to get back to the source. The output won’t be the same as the original source since it doesn’t preserve anything.

Here’s a bit of code that demonstrates an obscure Perl feature. I know that I can use an alternative delimiter for the substitution operator, so I try to be a bit clever and use the dot as a delimiter. Why doesn’t this do what I expect? I want to get rid of the dot in the middle of the string:

$_ = "foo.bar";
s.\...;
print "$_\n";

I don’t get rid of the dot, however. The f disappears instead of the dot. I’ve escaped the dot, so what’s the problem? Using B::Deparse, I see that Perl sees something different:

$ perl -MO=Deparse test
$_ = 'foo.bar';
s/.//;
print "$_\n";
test syntax OK

The escape first takes care of protecting the character I used as a delimiter, instead of making it a literal character in the pattern.

Here’s an example from Stunnix’s Perl obfuscator program.[32]It takes Perl source and makes it harder to read by changing variable names, converting strings to hex escapes, and converting numbers to arithmetic. It can also use the encoding trick I showed in the previous section, although this example doesn’t:

#!/usr/bin/perl

=head1 SYNOPSYS

A small program that does trivial things.

=cut
 sub zc47cc8b9f5 { ( my ( $z9e1f91fa38 ) = @_ ) ; print ( ( (
"\x69\x74\x27\x73\x20" . ( $z9e1f91fa38 + time ) ) .
"\x20\x73\x65\x63\x6f\x6e\x64\x73\x20\x73\x69\x6e\x63\x65\x20\x65\x70\x6f\x63\x68\x0a"
 ) ) ; } zc47cc8b9f5 ( (0x1963+ 433-0x1b12) ) ;

It’s trivial to get around most of that with B::Deparse. Its output un-encodes the strings and numbers and outputs them as their readable equivalents:

$ perl -MO=Deparse stunnix-do-it-encoded.pl
sub zc47cc8b9f5 {
        my($z9e1f91fa38) = @_;
        print q[it's ] . ($z9e1f91fa38 + time) . " seconds since epoch\n";
}
zc47cc8b9f5 2;

The Stunnix program thinks it’s clever by choosing apparently random strings for identifier names, but Joshua ben Jore’s B::Deobfuscate extends B::Deparse to take care of that, too. I can’t get back the original variable names, but I can get something easy to read and match up. Joshua chose to take identifier names from a list of flowers’ names:

$ perl -MO=Deobfuscate stunnix-do-it-encoded.pl
sub SacramentoMountainsPricklyPoppy {
    my($Low) = @_;
    print q[it's ] . ($Low + time) . " seconds since epoch\n";
}
SacramentoMountainsPricklyPoppy 2;

B::Deparse doesn’t stop there, either. Can’t remember what those Perl one-liners do? Add the -MO=Deparse to the command and see what comes out:

$ perl -MO=Deparse -naF: -le 'print $F[2]'

The deparser adds the code that I specified with the command line switches. The -n adds the while loop, the -a adds the split, and the -F changes the split pattern to the colon. The -l is one of my favorites because it automatically adds a newline to the end of print, and that’s how I get the $\ = "\n":

BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
        chomp $_;
        our(@F) = split(/:/, $_, 0);
        print $F[2];
}

Perl::Critic

In Perl Best Practices, Damian Conway laid out 256 suggestions for writing readable and maintainable code. Jeffrey Thalhammer created Perl::Critic by combining Damian’s suggestions with Adam Kennedy’s PPI, a Perl parser, to create a way for people to find style violations in their code. This isn’t just a tool for cleaning up Perl; it can keep me honest as I develop new code. I don’t have to wait until I’m done to use this. I should check myself (and my coworkers) frequently.

Once I install the Perl::Critic module,[33]I can run the perlcritic command. In this example I run it with the defaults to test my Use.Perl journal reading program. The violation I get tells me what’s wrong, gives me a reference in Perl Best Practices, and tells me the severity of the violation. Lower numbers are more severe, with 5 being the least severe:

$ perlcritic ~/bin/journals
Two-argument "open" used at line 105, column 1.  See page 207 of PBP.  (Severity: 5)
Bareword file handle opened at line 105, column 1.  See pages 202,204 of PBP.↲
(Severity: 5)
Integer with leading zeros at line 111, column 29.  See page 58 of PBP.  (Severity: 5)

I might feel pretty good that perlcritic only warns me about three things, but I’ll talk about that more in a minute. In the three issues that perlcritic reports, two of them I can fix right away. Since I wrote this program a long time ago, I didn’t use lexical filehandles or the three argument form of open. Line 105 of my program is old-style Perl:

open OUT, "| $Pager";

I change that line for better and more modern practice. I have to make a few other edits to support the change in the filehandle name, but it’s not a big deal. This takes care of two of my warnings:

open my($out), "|-", $Pager;

What about that third warning, however? Line 111 uses dbmopen and provides an octal number for the file permissions. This isn’t an odd thing to do; it’s the documented third argument:

dbmopen my %hash, $Counter, 0640 or die $!;

Looking at page 58 of Perl Best Practices, I see that Damian’s suggestion is to change that line to use oct instead:

dbmopen my %hash, $Counter, oct(640) or die $!;

I’m not going to make that change. It’s just silly. That’s okay, though, because Damian’s intent in Perl Best Practices is to make programmers think about the things they do and to develop a consistent, coherent, and robust programming style that’s understandable to most Perl programmers. His best practices aren’t commands so much as suggestions as good ways to do things. In some cases, such as avoiding writing literal numbers in octal, is not something that’s a severe problem in Perl. It’s actually a bit nitpicky. That’s okay, because perlcritic lets me modify how it reports violations.

Every Perl::Critic warning is implemented as a policy, which is a Perl module that checks for that particular coding practice. Before I can disable the warning I don’t like, I need to know which policy it is. I can pass perlcritic a format to use for its report by using the --verbose option. The format looks similar to those I use with printf, and the %p placeholder stands in for the policy name. Thus, I get the name of the troublesome policy, ValuesAndExpressions::ProhibitLeadingZeros:

$ perlcritic --verbose '%p\n' ~/bin/journals
ValuesAndExpressions::ProhibitLeadingZeros

If I want to see more about that particular violation, I can give the --verbose switch a number:

$ perlcritic --verbose 9  ~/bin/journals
Integer with leading zeros at line 111, column 29.
  ValuesAndExpressions::ProhibitLeadingZeros (Severity: 5)
        Perl interprets numbers with leading zeros as octal. If that's what you
        really want, its better to use `oct' and make it obvious.

          $var = 041;     #not ok, actually 33
          $var = oct(41); #ok

Now that I know the policy name, I can disable it in a .perlcriticrc file that I put in my home directory. I enclose the policy name in square brackets and prepend a - to the name to signal that I want to exclude it from the analysis:

# perlcriticrc
[-ValuesAndExpressions::ProhibitLeadingZeros]

When I run perlcritic again, I get the all clear:

$ perlcritic --verbose '%p\n' ~/bin/journals
/Users/brian/bin/journals source OK

That taken care of, I can start to look at less severe problems. I step down a level using the --severity switch. As with other debugging work, I take care of the most severe problems before moving on to the lesser problems. At the next level, the severe problems would be swamped in a couple hundred of the same violation, telling me I haven’t used Perl’s warnings in this program:

$ perlcritic --severity 4 ~/bin/journals
Code before warnings are enabled at line 79, column 1.  See page 431 of PBP.↲
(Severity: 4)
Code before warnings are enabled at line 79, column 6.  See page 431 of PBP.↲
(Severity: 4)
... snip a couple hundred more lines ...

I can also specify the severity levels according to their names. Table 7-1 shows the perlcritic levels. Severity level 4, which is one level below the most severe level, is -stern:

$ perlcritic -stern ~/bin/journals
Code before warnings are enabled at line 79, column 1.  See page 431 of PBP.↲
(Severity: 4)
Code before warnings are enabled at line 79, column 6.  See page 431 of PBP.↲
(Severity: 4)
... snip a couple hundred more lines ...

Table 7-1. perlcritic can take a severity number or a name

Number

Name

--severity 5

-gentle

--severity 4

-stern

--severity 3

-harsh

--severity 2

-cruel

--severity 1

-brutal

I find out that the policy responsible for this is TestingAndDebugging::RequireUseWarnings, but I’m neither testing nor debugging, so I have warnings turned off.[34]My .perlcriticrc is now a bit longer:

# perlcriticrc
[-ValuesAndExpressions::ProhibitLeadingZeros]
[-TestingAndDebugging::RequireUseWarnings]

I can continue the descent in severity to get pickier and pickier warnings. The lower I go, the more obstinate I get. For instance, perlcritic starts to complain about using die instead of croak, although in my program croak does nothing I need since I use die at the top-level of code rather than in subroutines. croak can adjust the report for the caller, but in this case there is no caller:

"die" used instead of "croak" at line 114, column 8.  See page 283 of PBP.  (Severity: 3)

If I want to keep using perlcritic, I need to adjust my configuration file for this program, but with these lower severity items, I probably don’t want to disable them across all of my perlcritic analyses. I copy my .perlcriticrc to journal-critic-profile and tell perlcritic where to find my new configuration using the --profile switch:

$ perlcritic --profile journal-critic-profile ~/bin/journals

Completely turning off a policy might not always be the best thing to do. There’s a policy to complain about using eval in a string context and that’s generally a good idea. I do need the string eval for dynamic module loading though. I need it to use a variable with require, which only takes a string or a bareword:

eval "require $module";

Normally, Perl::Critic complains about that because it doesn’t know that this particular use is the only way to do this. Ricardo Signes created Perl::Critic::Lax for just these situations. It adds a bunch of policies that complain about a construct unless it’s a use, such as my eval-require, that is a good idea. His policy Perl::Critic::Policy::Lax::ProhibitStringyEval::ExceptForRequire takes care of this one. String evals are still bad, but just not in this case. As I’m finishing this book, he’s just released this module, and I’m sure it’s going to get much more useful. By the time you get this book there will be even more Perl::Critic policies, so keep checking CPAN.

Creating My Own Perl::Critic Policy

That’s just the beginning of Perl::Critic. I’ve already seen how I want to change how it works so I can disable some policies, but I can also add policies of my own, too. Every policy is simply a Perl module. The policy modules live under the Perl::Critic::Policy::* namespace and inherit from the Perl::Critic::Policy module.[35]

package Perl::Critic::Policy::Subroutines::ProhibitMagicReturnValues;

use strict;
use warnings;
use Perl::Critic::Utils;
use base 'Perl::Critic::Policy';

our $VERSION = 0.01;

my $desc = q{returning magic values};


sub default_severity  { return $SEVERITY_HIGHEST  }
sub default_themes    { return qw(pbp danger)     }
sub applies_to        { return 'PPI::Token::Word' }


sub violates
        {
        my( $self, $elem ) = @_;
        return unless $elem eq 'return';
        return if is_hash_key( $elem );

        my $sib = $elem->snext_sibling();

        return unless $sib;
        return unless $sib->isa('PPI::Token::Number');
        return unless $sib =~ m/^\d+\z/;

        return $self->violation( $desc, [ 'n/a' ], $elem );
        }

1;

There’s much more that I can do with Perl::Critic. With the Test::Perl::Critic module, I can add its analysis to my automated testing. Every time I run make test I find out if I’ve violated the local style. The criticism pragma adds a warnings-like feature to my programs so I get Perl::Critic warnings (if there are any) when I run the program.

Although I might disagree with certain policies, that does not diminish the usefulness of Perl::Critic. It’s configurable and extendable so I can make it fit the local situation. Check the references at the end of this chapter for more information.

Summary

Code might come to me in all sorts of formats, encodings, and other tricks that make it hard to read, but I have many tools to clean it up and figure out what it’s doing. With a little work I can be reading nicely formatted code instead of suffering from the revenge of the programmers who came before me.

Further Reading

See the perltidy site for more details and examples: http://perltidy.sourceforge.net/. You can install perltidy by installing the Perl::Tidy module. It also has plug-ins for Vim and Emacs, as well as other editors.

The perlstyle documentation is a collection of Larry Wall’s style points. You don’t have to follow his style, but most Perl programmers seem to. Damian Conway gives his own style advice in Perl Best Practices.

Josh McAdams wrote “Perl Critic” for The Perl Review 2.3 (Summer 2006): http://www.theperlreview.com.

Perl::Critic has its own web site where you can upload code for it to analyze: http://perlcritic.com/. It also has a project page hosted at Tigris: http://perlcritic.tigris.org/.



[31] Actually, I wrote it normally then removed all of the good formatting.

[33] If you don’t want to install it, try http://www.perlcritic.com. It lets you upload a file for remote analysis.

[34] In general, I recommend turning off warnings once a program is in production. Turn on warnings when you need to test or debug the program, but after that, you don’t need them. The warnings will just fill up logfiles.

[35] The Perl::Critic::DEVELOPER documentation goes into this in detail.

Get Mastering Perl now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.