Part of mastering Perl is controlling the source code, no matter who gives it to you. People can usually read the code that they wrote, and usually complain about the code that other people wrote. In this chapter I’ll take that code and make it readable. This includes the output of so-called Perl obfuscators, which do much of their work by simply removing whitespace. You’re the programmer and it’s the source, so you need to show it who’s boss.
I’m not going to give any advice about code style, where to put the braces, or how many spaces to put where. These things are the sparks for heated debates that really do nothing to help you get work done. The Perl interpreter doesn’t really care, nor does the computer. But, after all, we write code for people first and computers second.
Good code, in my mind, is something that a skilled practitioner can easily read. It’s important to note that good code is not something that just anyone could read. Code isn’t bad just because a novice Perl programmer can’t read it. The first assumption has to be that the audience for any code is people who know the language or, if they don’t, know how to look up the parts they need to learn. Along with that, a good programmer should be able to easily deal with source written in the handful of major coding styles.
After that, consistency is the a major part of good code. Not only should I try to do the same thing in the same way each time (and that might mean everyone on the team doing it in the same way), but I should format it in the same way each time. Of course, there are edge cases and special situations, but for the most part, doing things the same way each time helps the new reader recognize what I’m trying to do.
Lastly, I like a lot of whitespace in my code, even before my eyesight started to get bad. Spaces separate tokens and blank lines separate groups of lines that go together, just as if I were writing prose. This book would certainly be hard to read without paragraph breaks; code has the same problem.
I have my own particular style that I like, but I’m not opposed to using another style. If I edit code or create a patch for somebody else’s code, I try to mimic his style. Remember, consistency is the major factor in good style. Adding my own style to existing code makes it inconsistent.
If you haven’t developed your own style or haven’t had one forced on you, the perlstyle documentation as well as Perl Best Practices by Damian Conway (O’Reilly) can help you set standards for you and your coding team.
The perltidy
program reformats Perl programs to make them easier to read. Given a
mess of code with odd indentation styles (or no indentation at all),
little or no whitespace between tokens, and all other manner of
obfuscation, perltidy
creates something
readable.
Here’s a short piece of code that I’ve intentionally written with bad style.[31]I haven’t done anything to obfuscate the program other than remove all the whitespace I could without breaking things:
#!/usr/bin/perl # yucky use strict;use warnings;my %Words;while(<>){chomp;s{^\s+}{};s{\s+$}{}; my $line=lc;my @words=split/\s+/,$line;foreach my $word(@words){ $word=~s{\W}{}g;next unless length $word;$Words{$word}++;}}foreach my $word(sort{$Words{$b}<=>$Words{$a}}keys %Words){last if $Words{$word}<10;printf"%5d %s\n",$Words{$word},$word;}
If somebody else handed me this program, could I tell what the program does? I might know what it does, but not how it does it. Certainly I could read it slowly and carefully keep track of things in my head, or I could start to add newlines between statements. That’s work, though, and too much work even for this little program.
I save this program in a file I name yucky and run it through perltidy
using its default options. perltidy
won’t overwrite my file, but instead
creates yucky.tdy with the
reformatted code:
$ perltidy yucky
Here’s the result of perltidy’s reformatting, which uses the suggestions from the perlstyle documentation:
#!/usr/bin/perl # yucky use strict; use warnings; my %Words; while (<>) { chomp; s{^\s+}{}; s{\s+$}{}; my $line = lc; my @words = split /\s+/, $line; foreach my $word (@words) { $word =~ s{\W}{}g; next unless length $word; $Words{$word}++; } } foreach my $word ( sort { $Words{$b} <=> $Words{$a} } keys %Words ) { last if $Words{$word} < 10; printf "%5d %s\n", $Words{$word}, $word; }
Maybe I’m partial to the GNU coding style, though, so I want that
format instead. I give perltidy
the -gnu
switch:
$ perltidy -gnu yucky
Now the braces and indentation are a bit different, but it’s still more readable than the original:
#!/usr/bin/perl # yucky use strict; use warnings; my %Words; while (<>) { chomp; s{^\s+}{}; s{\s+$}{}; my $line = lc; my @words = split /\s+/, $line; foreach my $word (@words) { $word =~ s{\W}{}g; next unless length $word; $Words{$word}++; } } foreach my $word (sort { $Words{$b} <=> $Words{$a} } keys %Words) { last if $Words{$word} < 10; printf "%5d %s\n", $Words{$word}, $word; }
I can get a bit fancier by asking perltidy
to format the program as HTML.
The -html
option doesn’t
reformat the program but just adds HTML markup and applies a stylesheet to
it. To get the fancy output on the reformatted program, I convert the
yucky.tdy to HTML:
$ perltidy yucky $ perltidy -html yucky.tdy
perltidy
can do quite a bit more
too. It has options to minutely control the formatting options for
personal preference, and many options to send the output from one place to
another, including an in-place editing feature.
Some people have the odd notion that they should make their Perl code harder to read. Sometimes they do this because they want to hide secrets, such as code to handle license management, or they don’t want people to distribute the code without their permission. Whatever their reason, they end up doing work that gets them nothing. The people who don’t know how to get the source back aren’t worrisome, and those who do will just be more interested in the challenge.
Perl code is very easy to reverse engineer since no matter what a code distributor does to the source, Perl still has to be able to run it. If Perl can get to the source, so can I with a little work. If you’re spending your time trying to hide your source from the people you’re giving it to, you’re wasting your time.
A favorite tactic of Perl obfuscators is also the favorite tactic of people who like to win the Obfuscated Perl Contest. That is, the Perl community does for sport what people try to sell you, so the Perl community has a lot of tricks to undo the damage.
I’ll show you the technique working forward first. Once you know the trick, it’s just monkey coding to undo it (annoying, but still tractable). I’ll start with a file japh-plaintext.pl:
#/usr/bin/perl # japh-plaintext.pl print "Just another Perl hacker,\n";
I want to take that file and transpose all of the characters so they become some other character. I’ll use ROT-13, which moves all of the letters over 13 places and wraps around the end. A real obfuscator will be more robust and handle special cases such as delimiters, but I don’t need to worry about that. I’m interested in defeating ones that have already done that work. I just read a file from the code line and output an encoded version:
#!/usr/bin/perl # japh-encoder-rot13.pl my $source = do { local $/; open my($fh), $ARGV[0] or die "$!"; <$fh> }; $source =~ tr/a-zA-Z/n-za-mN-ZA-M/; print $source;
What I get out looks like what I imagine might be some extraterrestrial language:
$ perl japh-encoder.pl japh-p* #/hfe/ova/crey # wncu-cynvagrkg.cy cevag "Whfg nabgure Crey unpxre,\a";
I can’t run this program because it’s no longer Perl. I need to
add some code at the end that will turn it back into Perl source. That
code has to undo the transformation, and then use the string form of
eval
to execute the decoded string as
(the original) code:
#!/usr/bin/perl # japh-encoder-decoder-rot13.pl my $source = do { local $/; open my($fh), $ARGV[0] or die "$!"; <$fh> }; $source =~ tr/a-zA-Z/n-za-mN-ZA-M/; print <<"HERE"; my \$v = q($source); \$v =~ tr/n-za-mN-ZA-M/a-zA-Z/; eval \$v; HERE
Now my encoded program comes with the code to undo the damage. A real obfuscator would also compress whitespace and remove other aids to reading, but my output will do fine for this demonstration:
$ perl japh-encoder-decoder-rot13.pl japh-plaintext.pl my $v = q(#/hfe/ova/crey # wncu-cynvagrkg.cy cevag "Whfg nabgure Crey unpxre,\a"; ); $v =~ tr/n-za-mN-ZA-M/a-zA-Z/; eval $v;
That’s the basic idea. The output still has to be Perl code, and it’s only a matter of the work involved to encode the source. That might be as trivial as my example or use some sort of secret such as a license key to decrypt it. Some things might even use several transformations. Here’s an encoder that works like ROT-13 except over the entire 8-bit range (so, ROT-255):
#!/usr/bin/perl # japh-encoder-decoder-rot255.pl my $source = do { local $/; open my($fh), $ARGV[0] or die "$!"; <$fh> }; $source =~ tr/\000-\377/\200-\377\000-\177/; print <<"HERE"; my \$v = q($source); \$v =~ tr/\200-\377\000-\177/\000-\377/; eval \$v; HERE
I take the already encoded output from my ROT-13 program and encode it again. The output is mostly goobledygook, and I can’t even see some of it on the screen because some 8-bit characters aren’t printable:
$ perl japh-encoder-decoder-rot13.pl japh-p* | perl japh-encoder-decoder-rot255.pl - my $v = q(íù ¤ö ½ ñ¨£¯èæå¯ïöá¯ãòåù£ ÷îãõãùîöáçòëç®ãùãåöáç ¢×èæç). q(îáâçõòå Ãòåù õîðøòå¬Üᢻ©»¤ö ½þ ôò¯îúáíÎÚÁͯáúÁÚ¯»). q(åöáì ¤ö»); $v =~ tr/-ÿ-/-ÿ/; eval $v;
Now that I’ve shown you the trick, I’ll work backward. From the
last output there, I see the string eval
. I’ll just change that to a print
:
my $v = q(íù ¤ö ½ ñ¨£¯èæå¯ïöá¯ãòåù£ ÷îãõãùîöáçòëç®ãùãåöáç ¢×èæç). q(îáâçõòå Ãòåù õîðøòå¬Üᢻ©»¤ö ½þ ôò¯îúáíÎÚÁͯáúÁÚ¯»). q(åöáì ¤ö»); $v =~ tr/-ÿ-/-ÿ/; print $v;
I run that program and get the next layer of encoding:
my $v = q(#/hfe/ova/crey # wncu-cynvagrkg.cy cevag "Whfg nabgure Crey unpxre,\a"; ); $v =~ tr/n-za-mN-ZA-M/a-zA-Z/; eval $v;
I change that eval
to a
print
, and I’m back to the original
source:
#/usr/bin/perl # japh-plaintext.pl print "Just another Perl hacker,\n";
I’ve now defeated the encoding tactic, but that’s not the only trick out there. I’ll show some more in a moment.
Not all of these techniques are about looking at other people’s
code. Sometimes I can’t figure out why Perl is doing something, so I
compile it, and then decompile it to see what Perl is thinking. The
B::Deparse
module takes some code, compiles into
Perl’s internal compiled structure, and then works backward to get back
to the source. The output won’t be the same as the original source since
it doesn’t preserve anything.
Here’s a bit of code that demonstrates an obscure Perl feature. I know that I can use an alternative delimiter for the substitution operator, so I try to be a bit clever and use the dot as a delimiter. Why doesn’t this do what I expect? I want to get rid of the dot in the middle of the string:
$_ = "foo.bar"; s.\...; print "$_\n";
I don’t get rid of the dot, however. The f
disappears instead of the dot. I’ve escaped
the dot, so what’s the problem? Using B::Deparse
, I
see that Perl sees something different:
$ perl -MO=Deparse test $_ = 'foo.bar'; s/.//; print "$_\n"; test syntax OK
The escape first takes care of protecting the character I used as a delimiter, instead of making it a literal character in the pattern.
Here’s an example from Stunnix’s Perl obfuscator program.[32]It takes Perl source and makes it harder to read by changing variable names, converting strings to hex escapes, and converting numbers to arithmetic. It can also use the encoding trick I showed in the previous section, although this example doesn’t:
#!/usr/bin/perl =head1 SYNOPSYS A small program that does trivial things. =cut sub zc47cc8b9f5 { ( my ( $z9e1f91fa38 ) = @_ ) ; print ( ( ( "\x69\x74\x27\x73\x20" . ( $z9e1f91fa38 + time ) ) . "\x20\x73\x65\x63\x6f\x6e\x64\x73\x20\x73\x69\x6e\x63\x65\x20\x65\x70\x6f\x63\x68\x0a" ) ) ; } zc47cc8b9f5 ( (0x1963+ 433-0x1b12) ) ;
It’s trivial to get around most of that with
B::Deparse
. Its output un-encodes the strings and
numbers and outputs them as their readable equivalents:
$ perl -MO=Deparse stunnix-do-it-encoded.pl sub zc47cc8b9f5 { my($z9e1f91fa38) = @_; print q[it's ] . ($z9e1f91fa38 + time) . " seconds since epoch\n"; } zc47cc8b9f5 2;
The Stunnix program thinks it’s clever by choosing apparently
random strings for identifier names, but Joshua ben Jore’s
B::Deobfuscate
extends B::Deparse
to take care of that, too. I can’t get back the original variable names,
but I can get something easy to read and match up. Joshua chose to take
identifier names from a list of flowers’ names:
$ perl -MO=Deobfuscate stunnix-do-it-encoded.pl sub SacramentoMountainsPricklyPoppy { my($Low) = @_; print q[it's ] . ($Low + time) . " seconds since epoch\n"; } SacramentoMountainsPricklyPoppy 2;
B::Deparse
doesn’t stop there, either. Can’t
remember what those Perl one-liners do? Add the -MO=Deparse
to the command and see what comes
out:
$ perl -MO=Deparse -naF: -le 'print $F[2]'
The deparser adds the code that I specified with the command line
switches. The -n
adds the while
loop, the -a
adds the split
, and the -F
changes the split pattern to the colon. The
-l
is one of my favorites because it
automatically adds a newline to the end of print
, and that’s how I get the $\ = "\n"
:
BEGIN { $/ = "\n"; $\ = "\n"; } LINE: while (defined($_ = <ARGV>)) { chomp $_; our(@F) = split(/:/, $_, 0); print $F[2]; }
In Perl Best Practices, Damian Conway laid out 256 suggestions for writing readable
and maintainable code. Jeffrey Thalhammer created
Perl::Critic
by combining Damian’s suggestions with
Adam Kennedy’s PPI
, a Perl parser, to create a way for
people to find style violations in their code. This isn’t just a tool for
cleaning up Perl; it can keep me honest as I develop new code. I don’t
have to wait until I’m done to use this. I should check myself (and my
coworkers) frequently.
Once I install the Perl::Critic
module,[33]I can run the perlcritic
command. In this
example I run it with the defaults to test my Use.Perl journal reading
program. The violation I get tells me what’s wrong,
gives me a reference in Perl Best Practices, and
tells me the severity of the violation. Lower numbers are more severe,
with 5 being the least severe:
$ perlcritic ~/bin/journals Two-argument "open" used at line 105, column 1. See page 207 of PBP. (Severity: 5) Bareword file handle opened at line 105, column 1. See pages 202,204 of PBP.↲ (Severity: 5) Integer with leading zeros at line 111, column 29. See page 58 of PBP. (Severity: 5)
I might feel pretty good that perlcritic
only
warns me about three things, but I’ll talk about that more in a minute. In
the three issues that perlcritic
reports, two of them I
can fix right away. Since I wrote this program a long time ago, I didn’t
use lexical filehandles or the three argument form of open
. Line 105 of my program is old-style
Perl:
open OUT, "| $Pager";
I change that line for better and more modern practice. I have to make a few other edits to support the change in the filehandle name, but it’s not a big deal. This takes care of two of my warnings:
open my($out), "|-", $Pager;
What about that third warning, however? Line 111 uses dbmopen
and provides an octal number for the
file permissions. This isn’t an odd thing to do; it’s the documented third
argument:
dbmopen my %hash, $Counter, 0640 or die $!;
Looking at page 58 of Perl Best Practices, I
see that Damian’s suggestion is to change that line to use oct
instead:
dbmopen my %hash, $Counter, oct(640) or die $!;
I’m not going to make that change. It’s just silly. That’s okay,
though, because Damian’s intent in Perl Best
Practices is to make programmers think about the things they do
and to develop a consistent, coherent, and robust programming style that’s
understandable to most Perl programmers. His best practices aren’t
commands so much as suggestions as good ways to do things. In some cases,
such as avoiding writing literal numbers in octal, is not something that’s
a severe problem in Perl. It’s actually a bit nitpicky. That’s okay,
because perlcritic
lets me modify how
it reports violations.
Every Perl::Critic
warning is implemented as a
policy, which is a Perl module that checks for that
particular coding practice. Before I can disable the warning I don’t like,
I need to know which policy it is. I can pass
perlcritic
a format to use for its report by using the
--verbose
option. The format looks
similar to those I use with printf
, and
the %p
placeholder stands in for the
policy name. Thus, I get the name of the troublesome policy, ValuesAndExpressions::ProhibitLeadingZeros
:
$ perlcritic --verbose '%p\n' ~/bin/journals ValuesAndExpressions::ProhibitLeadingZeros
If I want to see more about that particular violation, I can give
the --verbose
switch a number:
$ perlcritic --verbose 9 ~/bin/journals Integer with leading zeros at line 111, column 29. ValuesAndExpressions::ProhibitLeadingZeros (Severity: 5) Perl interprets numbers with leading zeros as octal. If that's what you really want, its better to use `oct' and make it obvious. $var = 041; #not ok, actually 33 $var = oct(41); #ok
Now that I know the policy name, I can disable it in a .perlcriticrc file that I put in my home
directory. I enclose the policy name in square brackets and prepend a
-
to the name to signal that I want to
exclude it from the analysis:
# perlcriticrc [-ValuesAndExpressions::ProhibitLeadingZeros]
When I run perlcritic
again, I get the all
clear:
$ perlcritic --verbose '%p\n' ~/bin/journals /Users/brian/bin/journals source OK
That taken care of, I can start to look at less severe problems. I
step down a level using the --severity
switch. As with other debugging work, I take care of the most severe
problems before moving on to the lesser problems. At the next level, the
severe problems would be swamped in a couple hundred of the same
violation, telling me I haven’t used Perl’s warnings in this
program:
$ perlcritic --severity 4 ~/bin/journals Code before warnings are enabled at line 79, column 1. See page 431 of PBP.↲ (Severity: 4) Code before warnings are enabled at line 79, column 6. See page 431 of PBP.↲ (Severity: 4) ... snip a couple hundred more lines ...
I can also specify the severity levels according to their names.
Table 7-1 shows the perlcritic
levels. Severity level 4, which is
one level below the most severe level, is -stern
:
$ perlcritic -stern ~/bin/journals Code before warnings are enabled at line 79, column 1. See page 431 of PBP.↲ (Severity: 4) Code before warnings are enabled at line 79, column 6. See page 431 of PBP.↲ (Severity: 4) ... snip a couple hundred more lines ...
Table 7-1. perlcritic can take a severity number or a name
Number | Name |
---|---|
|
|
| -stern |
|
|
|
|
|
|
I find out that the policy responsible for this is TestingAndDebugging::RequireUseWarnings
, but I’m
neither testing nor debugging, so I have warnings turned off.[34]My .perlcriticrc is now
a bit longer:
# perlcriticrc [-ValuesAndExpressions::ProhibitLeadingZeros] [-TestingAndDebugging::RequireUseWarnings]
I can continue the descent in severity to get pickier and pickier
warnings. The lower I go, the more obstinate I get. For instance, perlcritic
starts to complain about using
die
instead of croak
, although in my program croak
does nothing I need since I use die
at the top-level of code rather than in
subroutines. croak
can adjust the
report for the caller, but in this case there is no caller:
"die" used instead of "croak" at line 114, column 8. See page 283 of PBP. (Severity: 3)
If I want to keep using perlcritic
, I need to adjust my configuration
file for this program, but with these lower severity items, I probably
don’t want to disable them across all of my perlcritic
analyses. I copy my .perlcriticrc to journal-critic-profile and tell perlcritic
where to find my new configuration
using the --profile
switch:
$ perlcritic --profile journal-critic-profile ~/bin/journals
Completely turning off a policy might not always be the best thing
to do. There’s a policy to complain about using eval
in a string context and that’s generally a
good idea. I do need the string eval
for dynamic module loading though. I need it to use a variable with
require
, which only takes a string or a
bareword:
eval "require $module";
Normally,
complains about that because it doesn’t know that this particular use is
the only way to do this. Ricardo Signes created
Perl::Critic
Perl::Critic::Lax
for just these situations. It adds a
bunch of policies that complain about a construct unless it’s a use, such
as my eval
-require
, that is a good idea. His policy
Perl::Critic::Policy::Lax::ProhibitStringyEval::ExceptForRequire
takes care of this one. String eval
s
are still bad, but just not in this case. As I’m finishing this book, he’s
just released this module, and I’m sure it’s going to get much more
useful. By the time you get this book there will be even more
Perl::Critic
policies, so keep checking CPAN.
That’s just the beginning of Perl::Critic
. I’ve
already seen how I want to change how it works so I can disable some
policies, but I can also add policies of my own, too. Every policy is
simply a Perl module. The policy modules live under the Perl::Critic::Policy::*
namespace and inherit
from the Perl::Critic::Policy
module.[35]
package Perl::Critic::Policy::Subroutines::ProhibitMagicReturnValues; use strict; use warnings; use Perl::Critic::Utils; use base 'Perl::Critic::Policy'; our $VERSION = 0.01; my $desc = q{returning magic values}; sub default_severity { return $SEVERITY_HIGHEST } sub default_themes { return qw(pbp danger) } sub applies_to { return 'PPI::Token::Word' } sub violates { my( $self, $elem ) = @_; return unless $elem eq 'return'; return if is_hash_key( $elem ); my $sib = $elem->snext_sibling(); return unless $sib; return unless $sib->isa('PPI::Token::Number'); return unless $sib =~ m/^\d+\z/; return $self->violation( $desc, [ 'n/a' ], $elem ); } 1;
There’s much more that I can do with
Perl::Critic
. With the
Test::Perl::Critic
module, I can add its analysis to
my automated testing. Every time I run make
test
I find out if I’ve violated the local style. The
criticism
pragma adds a
warnings
-like feature to my programs so I get
Perl::Critic
warnings (if there are any) when I run
the program.
Although I might disagree with certain policies, that does not
diminish the usefulness of Perl::Critic
. It’s
configurable and extendable so I can make it fit the local situation.
Check the references at the end of this chapter for more
information.
Code might come to me in all sorts of formats, encodings, and other tricks that make it hard to read, but I have many tools to clean it up and figure out what it’s doing. With a little work I can be reading nicely formatted code instead of suffering from the revenge of the programmers who came before me.
See the perltidy
site for more
details and examples: http://perltidy.sourceforge.net/. You can install perltidy
by installing the
Perl::Tidy
module. It also has plug-ins for Vim and
Emacs, as well as other editors.
The perlstyle
documentation is a
collection of Larry Wall’s style points. You don’t have to follow his
style, but most Perl programmers seem to. Damian Conway gives his own
style advice in Perl Best Practices.
Josh McAdams wrote “Perl Critic” for The Perl Review 2.3 (Summer 2006): http://www.theperlreview.com.
Perl::Critic
has its own web site where you can
upload code for it to analyze: http://perlcritic.com/. It also has a project page hosted
at Tigris: http://perlcritic.tigris.org/.
[31] Actually, I wrote it normally then removed all of the good formatting.
[32] Stunnix Perl-obfus (http://www.stunnix.com/prod/po/overview.shtml).
[33] If you don’t want to install it, try http://www.perlcritic.com. It lets you upload a file for remote analysis.
[34] In general, I recommend turning off warnings once a program is in production. Turn on warnings when you need to test or debug the program, but after that, you don’t need them. The warnings will just fill up logfiles.
[35] The Perl::Critic::DEVELOPER
documentation
goes into this in detail.
Get Mastering Perl now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.