BUY THIS BOOK
Add to Cart

Print Book $39.95


Add to Cart

Print+PDF $51.94

Add to Cart

PDF $31.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £28.50

What is this?

Looking to Reprint or License this content?


Perl Best Practices
Perl Best Practices

By Damian Conway
Book Price: $39.95 USD
£28.50 GBP
PDF Price: $31.99

Cover | Table of Contents


Table of Contents

Chapter 1: Best Practices
We do not all have to write like Faulkner, or program
like Dijkstra. I will gladly tell people what my
programming style is, and I will even tell them where I
think their own style is unclear or makes me jump
through mental hoops.
But I do this as a fellow programmer, not as the Perl
god ... stylistic limits should be self-imposed, or at most
policed by consensus among your buddies.
—Larry Wall
Natural Language Principles in Perl
Code matters. Analysis, design, decomposition, algorithms, data structures, and control flow mean nothing until they are made real, given form and power in the statements of some programming language. It is code that allows abstractions and ideas to control the physical world, that enables mathematical procedures to govern real-world processes, that converts data into information and information into knowledge.
Code matters. So the way in which you code matters too. Every programmer has a unique approach to writing software; a unique coding style. Programmers' styles are based on their earliest experiences in programming—the linguistic idiosyncrasies of their first languages, the way in which code was presented in their initial textbooks, and the stylistic prejudices of their early instructors. That style will develop and change as the programmer's experience and skills increase. Indeed, most programmers' style is really just a collection of coding habits that have evolved in response to the opportunities and pressures they have experienced throughout their careers.
Just as in natural evolution, those opportunities and pressures may lead to a coding style that is fit, strong, and well-adapted to the programmer's needs. Or it may lead to a coding style that is nasty, brutish, and underthought. But what it most often leads to is something even worse: Intuitive Programmer Syndrome .
Many programmers code by instinct. They aren't conscious of the hundreds of choices they make every time they code: how they format their source, the names they use for variables, the kinds of loops they use (
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Three Goals
A good coding style is one that reduces the costs of your software project. There are three main ways in which a coding style can do that: by producing applications that are more robust, by supporting implementations that are more efficient, and by creating source code that is easier to maintain.
When deciding how you will write code, choose a style that is likely to reduce the number of bugs in your programs. There are several ways that your coding style can do that:
  • A coding style can minimize the chance of introducing errors in the first place. For example, appending _ref to the name of every variable that stores a reference (see Chapter 3) makes it harder to accidentally write $array_ref[$n] instead of $array_ref->[$n], because anything except an arrow after _ref will soon come to look wrong.
  • A coding style can make it easy to detect incorrect edge cases, where bugs often hide. For example, constructing a regular expression from a table (see Chapter 12) can prevent that regex from ever matching a value that the table doesn't cover, or from failing to match a value that it does.
  • A coding style can help you avoid constructs that don't scale well. For example, avoiding a cascaded if-elsif-elsif-elsif-... in favour of table look-ups (see Chapter 6) can ensure that the cost of any selection statement stays nearly constant, rather than growing linearly with the number of alternatives.
  • A coding style can improve how code handles failure. For example, mandating a standard interface for I/O prompting (see Chapter 10) can encourage developers to habitually verify terminal input, rather than simply assuming it will always be correct.
  • A coding style can improve how code reports failure. For example, a rule that every failure must throw an exception, rather than returning an
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
This Book
To help you develop that consistent and coherent approach, the following 18 chapters explore a coordinated set of coding practices that have been specifically designed to enhance the robustness, efficiency, and maintainability of Perl code.
Each piece of advice is framed as a single imperative sentence—a "Thou shalt..." or a "Thou shalt not...", presented like this:
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
Each such admonition is followed by a detailed explanation of the rule, explaining how and when it applies. Every recommendation also includes a summary of the reasoning behind the prescription or proscription, usually in terms of how it can improve the reliability, performance, or comprehensibility of your code.
Almost every guideline also includes at least one example of code that conforms to the rule (set in constant-width bold) as well as counterexamples that break it (set in constant-width regular). These code fragments aim to demonstrate the advantages of following the suggested practice, and the problems that can occur if you don't. All of these examples are also available for you to download and reuse from http://www.oreilly.com/catalog/perlbp.
The guidelines are organized by topic, not by significance. For example, some readers will wonder why use strict and use warnings aren't mentioned on page 1. But if you've already seen the light on those two, they don't need to be on page 1. And if you haven't seen the light yet, Chapter 18 is soon enough. By then you'll have discovered several hundred ways in which code can go horribly wrong, and will be better able to appreciate these two ways in which Perl can help your code go right.
Other readers may object to "trivial" code layout recommendations appearing so early in the book. But if you've ever had to write code as part of a group, you'll know that layout is where most of the arguments start. Code layout is the medium in which all other coding practices are practised, so the sooner everyone can admit that code layout
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Rehabiting
People cling to their current coding habits even when those habits are manifestly making their code buggy, slow, and incomprehensible to others. They cling to those habits because it's easier to live with their deficiencies than it is to fix them. Not thinking about how you code requires no effort. That's the whole point of a habit. It's a skill that has been compiled down from a cerebral process and then burnt into muscle memory; a microcoded reflex that your fingers can perform without your conscious control.
For example, if you're an aficionado of the BSD style of bracketing (see Chapter 2), then it's likely that your fingers can type Closingparen-Return-Openingcurly-Return-Tab without your ever needing to think about it—which makes it especially hard if your development team decides to adopt K&R bracketing instead, because now you have to type Closingparen-Return-Openingcurly-Return-dammit!-Backspace-Backspace-Backspace-Space-Openingcurly-Return-Tab for a couple of months until your fingers learn the new sequence.
Likewise, if you're used to writing Perl like this:
     @tcmd= grep /^.*;$/ => @cmd;
then abiding by the guidelines in this book and writing this instead:
            
    @terminated_commands
        = grep { m/ \A [^\n]* ; \n? \z /xms } @raw_commands;
         
will be deeply onerous. At least, it will be at first, until you break your existing habits and develop new ones.
But that's the great thing about programming habits: they're incredibly easy to change. All you have to do is consciously practise things the new way for long enough, and eventually your coding habits will automatically re-formulate themselves around that new behaviour.
So, if you decide to adopt the recommendations in the following chapters, try to adopt them zealously. See how often you can catch yourself (or others in your team) breaking the new rules. Stop letting your fingers do the programming. Recorrect each old habit the instant you notice yourself backsliding. Be strict with your hands. Rather than letting them type what feels good, force them to type what works well.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Code Layout
Most people's [...] programs should be indented six feet downward and covered with dirt.
—Blair P. Houghton
Formatting. Indentation. Style. Code layout. Whatever you choose to call it, it's one of the most contentious aspects of programming discipline. More and bloodier wars have been fought over code layout than over just about any other aspect of coding.
So what is the best practice here? Should you use classic Kernighan & Ritchie (K&R) style? Or go with BSD code formatting? Or adopt the layout scheme specified by the GNU project? Or conform to the Slashcode coding guidelines?
Of course not! Everyone knows that [insert your personal coding style here] is the One True Layout Style, the only sane choice, as ordained by [insert your favorite Programming Deity here] since Time Immemorial! Any other choice is manifestly absurd, willfully heretical, and self-evidently a Work of Darkness!!!
And that's precisely the problem. When deciding on a layout style, it's hard to decide where rational choices end and rationalized habits begin.
Adopting a coherently designed approach to code layout, and then applying that approach consistently across all your coding, is fundamental to best practice programming. Good layout can improve the readability of a program, help detect errors within it, and make the structure of your code much easier to comprehend. Layout matters.
But most coding styles—including the four mentioned earlier—confer those benefits almost equally well. So while it's true that having a consistent code layout scheme matters very much indeed, the particular code layout scheme you ultimately decide upon... does not matter at all!
All that matters is that you adopt a single, coherent style; one that works for your entire programming team. And, having agreed upon that style, that you then apply it consistently across all your development.
The layout guidelines suggested in this chapter have been carefully and consciously selected from many alternatives, in a deliberate attempt to construct a coding style that is self-consistent and concise, that improves the readability of the resulting code, that makes it easy to detect coding mistakes, and that works well for a wide range of programmers in a wide range of development environments.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Bracketing
Brace and parenthesize in K&R style.
When setting out a code block, use the K&R style of bracketing . That is, place the opening brace at the end of the construct that controls the block. Then start the contents of the block on the next line, and indent those contents by one indentation level. Finally, place the closing brace on a separate line, at the same indentation level as the controlling construct.
Likewise, when setting out a parenthesized list over multiple lines, put the opening parenthesis at the end of the controlling expression; arrange the list elements on the subsequent lines, indented by one level; and place the closing parenthesis on its own line, outdenting it back to the level of the controlling expression. For example:
            
    my @names = (
        'Damian',    
                  # Primary key
               
        'Matthew',   
                  # Disambiguator
               
        'Conway',    
                  # General class or category
               
    );

    for my $name (@names) {
        for my $word ( anagrams_of(lc $name) ) {
            print "$word\n";
        }
    }
         
Don't place the opening brace or parenthesis on a separate line, as is common under the BSD and GNU styles of bracketing :
            # Don't use BSD style...
    my @names =
    (
        'Damian',    # Primary key
        'Matthew',   # Disambiguator
        'Conway',    # General class or category
    );

    for my $name (@names)
    {
        for my $word (anagrams_of(lc $name))
        {
            print "$word\n";
        }
    }

    # And don't use GNU style either...

    for my $name (@names)
      {
        for my $word (anagrams_of(lc $name))
          {
            print "$word\n";
          }
      }
The K&R style has one obvious advantage over the other two styles: it requires one fewer line per block, which means one more line of actual code will be visible at any time on your screen. If you're looking at a series of blocks, that might add up to three or four extra code lines per screen.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Keywords
Separate your control keywords from the following opening bracket.
Control structures regulate the dynamic behaviour of a program, so the keywords of control structures are amongst the most critical components of a program. That's why it's important that those keywords stand out clearly in the source code.
In Perl, most control structure keywords are immediately followed by an opening parenthesis, which can make it easy to confuse them with subroutine calls. It's important to distinguish the two. To do this, use a single space between a keyword and the following brace or parenthesis:
            
    for my $result (@results) {
        print_sep();
        print $result;
    }

    while ($min < $max) {
        my $try = ($max - $min) / 2;
        if ($value[$try] < $target) {
            $max = $try;
        }
        else {
            $min = $try;
        }
    }
         
Without the intervening space, it's harder to pick out the keyword, and easier to mistake it for the start of a subroutine call:
    for(@results) {
        print_sep();
        print;
    }

    while($min < $max) {
        my $try = ($max - $min) / 2;
        if($value[$try] < $target) {
            $max = $try;
        }
        else{
            $min = $try;
        }
    }
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Subroutines and Variables
Don't separate subroutine or variable names from the following opening bracket.
In order for the previous rule to work properly, it's important that subroutines and variables not have a space between their names and any following brackets. Otherwise, it's too easy to mistake a subroutine call for a control structure, or misread the initial part of an array element as an independent scalar variable.
So cuddle subroutine calls and variable names against their trailing parentheses or braces:
            
    my @candidates = get_candidates($marker);

    CANDIDATE:
    for my $i (0..$#candidates) {
        next CANDIDATE if open_region($i);

        $candidates[$i]
            = $incumbent{ $candidates[$i]{region} };
    }
         
Spacing them out only makes them harder to recognize:
    my @candidates = get_candidates ($marker);

    CANDIDATE:
    for my $i (0..$#candidates) {
        next CANDIDATE if open_region ($i);

        $candidates [$i]
            = $incumbent {$candidates [$i] {region}};
    }
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Builtins
Don't use unnecessary parentheses for builtins and "honorary" builtins.
Perl's many built-in functions are effectively keywords of the language, so they can legitimately be called without parentheses, except where it's necessary to enforce precedence.
Calling builtins without parentheses reduces clutter in your code, and thereby enhances readability. The lack of parentheses also helps to visually distinguish between subroutine calls and calls to builtins:
            
    while (my $record = <$results_file>) {
        chomp $record;
        my ($name, $votes) = split "\t", $record;
        print 'Votes for ',
              substr($name, 0, 10),       
                  # Parens needed for precedence
               
              ": $votes (verified)\n";
    }
         
Certain imported subroutines, usually from modules in the core distribution, also qualify as "honorary" builtins, and may be called without parentheses. Typically these will be subroutines that provide functionality that ought to be in the language itself but isn't. Examples include carp and croak (from the standard Carp module—see Chapter 13), first and max (from the standard List::Util module—see Chapter 8), and prompt (from the IO::Prompt CPAN module—see Chapter 10).
Note, however, that in any cases where you find that you need to use parentheses in builtins, they should follow the rules for subroutines, not those for control keywords. That is, treat them as subroutines, with no space between the builtin name and the opening parenthesis:
            
    while (my $record = <$results_file>) {
        chomp( $record );
        my ($name, $votes) = split("\t", $record);
        print(
            'Votes for ',
            substr($name, 0, 10),
            ": $votes (verified)\n"
        );
    }
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Keys and Indices
Separate complex keys or indices from their surrounding brackets.
When accessing elements of nested data structures (hashes of hashes of arrays of whatever), it's easy to produce a long, complex, and visually dense expression, such as:
    $candidates[$i] = $incumbent{$candidates[$i]{get_region()}};
That's especially true when one or more of the indices are themselves indexed variables. Squashing everything together without any spacing doesn't help the readability of such expressions. In particular, it can be difficult to detect whether a given pair of brackets is part of the inner or outer index.
Unless an index is a simple constant or scalar variable, it's much clearer to put spaces between the indexing expression and its surrounding brackets:
            
    $candidates[$i] = $incumbent{ $candidates[$i]{ get_region() } };
         
Note that the determining factors here are both the complexity and the overall length of the index. Occasionally, "spacing-out" an index makes sense even if that index is just a single constant or scalar. For example, if that simple index is unusually long, it's better written as:
            
    print $incumbent{ $largest_gerrymandered_constituency };
         
rather than:
    print $incumbent{$largest_gerrymandered_constituency};
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Operators
Use whitespace to help binary operators stand out from their operands.
Long expressions can be hard enough to comprehend without adding to their complexity by jamming their various components together:
    my $displacement=$initial_velocity*$time+0.5*$acceleration*$time**2;

    my $price=$coupon_paid*$exp_rate+(($face_val+$coupon_val)*$exp_rate**2);
Give your binary operators room to breathe, even if it requires an extra line to do so:
            
    my $displacement
        = $initial_velocity * $time  +  0.5 * $acceleration * $time**2;

    my $price
        = $coupon_paid * $exp_rate  +  ($face_val + $coupon_paid) * $exp_rate**2;
         
Choose the amount of whitespace according to the precedence of the operators, to help the reader's eyes pick out the natural groupings within the expression. For example, you might put additional spaces on either side of the lower-precedence + to visually reinforce the higher precedence of the two multiplicative subexpressions surrounding it. On the other hand, it's quite appropriate to sandwich the ** operator tightly between its operands, given its very high precedence and its longer, more easily identified symbol.
A single space is always sufficient whenever you're also using parentheses to emphasize (or to vary) precedence:
            
    my $velocity
        = $initial_velocity + ($acceleration * ($time + $delta_time));

    my $future_price
        = $current_price * exp($rate - $dividend_rate_on_index) * ($delivery - $now);
         
Symbolic unary operators should always be kept with their operands:
            
    my $spring_force = !$hyperextended ? -$spring_constant * $extension : 0;

    my $payoff = max(0, -$asset_price_at_maturity + $strike_price);
         
Named unary operators should be treated like builtins, and spaced from their operands appropriately:
            
    my $tan_theta = sin $theta / cos $theta;

    my $forward_differential_1_year = $delivery_price * exp -$interest_rate;
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Semicolons
Place a semicolon after every statement.
In Perl, semicolons are statement separators, not statement terminators, so a semicolon isn't required after the very last statement in a block. Put one in anyway, even if there's only one statement in the block:
            
    while (my $line = <>) {
        chomp $line;

        if ( $line =~ s{\A (\s*) -- (.*)}{$1#$2}xms ) {
            push @comments, $2;
        }

        print $line;
    }
         
The extra effort to do this is negligible, and that final semicolon confers two very important advantages. It signals to the reader that the preceding statement is finished, and (perhaps more importantly) it signals to the compiler that the statement is finished. Telling the compiler is more important than telling the reader, because the reader can often work out what you really meant, whereas the compiler reads only what you actually wrote.
Leaving out the final semicolon usually works fine when the code is first written (i.e., when you're still paying proper attention to the entire piece of code):
    while (my $line = <>) {
        chomp $line;

        if ( $line =~ s{\A (\s*) -- (.*)}{$1#$2}xms ) {
            push @comments, $2
        }

        print $line
    }
But, without the semicolons, there's nothing to prevent later additions to the code from causing subtle problems:
    while (my $line = <>) {
        chomp $line;

        if ( $line =~ s{\A (\s*) -- (.*)}{$1#$2}xms ) {
            push @comments, $2
            /shift/mix
        }

        print $line
        $src_len += length;
    }
The problem is that those two additions don't actually add new statements; they just absorb the existing ones. So the previous code actually means:
    while (my $line = <>) {
        chomp $line;

        if ( $line =~ s{\A (\s*) -- (.*)}{$1#$2}xms ) {
            push @comments, $2 / shift() / mix()
        }

        print $line ($src_len += length);
    }
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Commas
Place a comma after every value in a multiline list.
Just as semicolons act as separators in a block of statements, commas act as separators in a list of values. That means that exactly the same arguments apply in favour of treating them as terminators instead.
Adding an extra trailing comma (which is perfectly legal in any Perl list) also makes it much easier to reorder the elements of the list. For example, it's much easier to convert:
            
    my @dwarves = (
        'Happy',
        'Sleepy',
        'Dopey',
        'Sneezy',
        'Grumpy',
        'Bashful',
        'Doc',
    );
         
to:
            
    my @dwarves = (
        'Bashful',
        'Doc',
        'Dopey',
        'Grumpy',
        'Happy',
        'Sleepy',
        'Sneezy',
    );
         
You can manually cut and paste lines or even feed the list contents through sort.
Without that trailing comma after 'Doc', reordering the list would introduce a bug:
    my @dwarves = (
        'Bashful',
        'Doc'
        'Dopey',
        'Grumpy',
        'Happy',
        'Sleepy',
        'Sneezy',
    );
Of course, that's a trivial mistake to find and fix, but why not adopt a coding style that eliminates the very possibility of such problems?
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Line Lengths
Use 78-column lines.
In these modern days of high-resolution 30-inch screens, anti-aliased fonts, and laser eyesight correction, it's entirely possible to program in a terminal window that's 300 columns wide.
Please don't.
Given the limitations of printed documents, legacy VGA display devices, presentation software, and unreconstructed managerial optics, it isn't reasonable to format code to a width greater than 80 columns. And even an 80-column line width is not always safe, given the text-wrapping characteristics of some terminals, editors, and mail systems.
Setting your right margin at 78 columns maximizes the usable width of each code line whilst ensuring that those lines appear consistently on the vast majority of display devices.
In vi, you can set your right margin appropriately by adding:
            
    set textwidth=78
         
to your configuration file. For Emacs, use:
            
    (setq fill-column 78)
    (setq auto-fill-mode t)
         
Another advantage of this particular line width is that it ensures that any code fragment sent via email can be quoted at least once without wrapping:
            
    From: boss@headquarters
    To: you@saltmines
    Subject: Please explain

    I came across this chunk of code in your latest module.
    Is this your idea of a joke???

    > $;=$/;seek+DATA,undef$/,!$s;$_=<DATA>;$s&&print||(*{q;::\;
    > ;}=sub{$d=$d-1?$d:$0;s;';\t#$d#;,$_})&&$g&&do{$y=($x||=20)*($y||8);sub
    > i{sleep&f}sub'p{print$;x$=,join$;,$b=~/.{$x}/g,$;}sub'f{pop||1}sub'n{substr($b
    > ,&f%$y,3)=~tr,O,O,}sub'g{@_[@_]=@_;--($f=&f);$m=substr($b,&f,1);($w,$w,$m,O)
    > [n($f-$x)+n($x+$f)-(${m}eq+O=>)+n$f]||$w}$w="\40";$b=join'',@ARGV?<>:$_,$w
    > x$y;$b=~s).)$&=~/\w/?O:$w)gse;substr($b,$y)=q++;$g='$i=0;$i?$b:$c=$b;
    > substr+$c,$i,1,g$i;$g=~s?\d+?($&+1)%$y?e;$i-$y+1?eval$g:do{$b=$c;p;i}';
    > sub'e{eval$g;&e};e}||eval||die+No.$;
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Indentation
Use four-column indentation levels.
Indentation depth is far more controversial than line width. Ask four programmers the right number of columns per indentation level and you'll get four different answers: two-, three-, four-, or eight-column indents. You'll usually also get a heated argument.
The ancient coding masters, who first cut code on teletypes or hardware terminals with fixed tabstops, will assert that eight columns per level of indentation is the only acceptable ratio, and support that argument by pointing out that most printers and software terminals still default to eight-column tabs. Eight columns per indentation level ensures that your code looks the same everywhere:
    while (my $line = <>) {
            chomp $line;
            if ( $line =~ s{\A (\s*) -- ([^\n]*) }{$1#$2}xms ) {
                    push @comments, $2;
            }
            print $line;
    }
Yes (agree many younger hackers), eight-column indents ensure that your code looks equally ugly and unreadable everywhere! Instead, they insist on no more than two or three columns per indentation level. Smaller indents maximize the number of levels of nesting available across a fixed-width display: about a dozen levels under a two- or three-column indent, versus only four or five levels with eight-column indents. Shallower indentation also reduces the horizontal distance the eye has to track, thereby keeping indented code in the same vertical sight-line and making the context of any line of code easier to ascertain:
    while (my $line = <>) {
      chomp $line;
      if ( $line =~ s{\A (\s*) -- ([^\n]*) }{$1#$2}xms ) {
        push @comments, $2;
      }
      print $line;
    }
The problem with this approach (cry the ancient masters) is that it can make indentations impossible to detect for anyone whose eyes are older than 30, or whose vision is worse than 20/20. And that's the crux of the problem. Deep indentation enhances structural readability at the expense of contextual readability; shallow indentation, vice versa. Neither is ideal.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Tabs
Indent with spaces, not tabs .
Tabs are a bad choice for indenting code, even if you set your editor's tabspacing to four columns. Tabs do not appear the same when printed on different output devices, or pasted into a word-processor document, or even just viewed in someone else's differently tabspaced editor. So don't use tabs alone or (worse still) intermix tabs with spaces:
    sub addarray_internal {
    »   my ($var_name, $need_quotemeta) = @_;

    »   $raw .= $var_name;

    »   my $quotemeta = $need_quotemeta ? q{ map {quotemeta $_} }
    »   »   »   »   » :                   $EMPTY_STR
    »   ··············;

    ····my $perl5pat
    ····»   = qq{(??{join q{|}, $quotemeta \@{$var_name}})};

    »   push @perl5pats, $perl5pat;

    »   return;
    }
The only reliable, repeatable, transportable way to ensure that indentation remains consistent across viewing environments is to indent your code using only spaces. And, in keeping with the previous rule on indentation depth, that means using four space characters per indentation level:
            
    sub addarray_internal {
    ····my ($var_name, $need_quotemeta) = @_;

    ····$raw .= $var_name;

    ····my $quotemeta = $need_quotemeta ? q{ map {quotemeta $_} }
    ··················:···················$EMPTY_STR
    ··················;

    
                  ····
               my $perl5pat
    
                  ········
               = qq{(??{join q{|}, $quotemeta \@{$var_name}})};

    
                  ····
               push @perl5pats, $perl5pat;

    
                  ····
               return;
    }
         
Note that this rule doesn't mean you can't use the Tab key to indent your code; only that the result of pressing that key can't actually be a tab. That's usually very easy to ensure under modern editors, most of which can easily be configured to convert tabs to spaces. For example, if you use vim, you can include the following directives in your
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Blocks
Never place two statements on the same line.
If two or more statements share one line, each of them becomes harder to comprehend:
    RECORD:
    while (my $record = <$inventory_file>) {
        chomp $record; next RECORD if $record eq $EMPTY_STR;
        my @fields = split $FIELD_SEPARATOR, $record; update_sales(\@fields);$count++;
    }
You're already saving vertical space by using K&R bracketing; use that space to improve the code's readability, by giving each statement its own line:
            
    RECORD:
    while (my $record = <$inventory_file>) {
        chomp $record;
        next RECORD if $record eq $EMPTY_STR;
        my @fields = split $FIELD_SEPARATOR, $record;
        update_sales(\@fields);
        $count++;
    }
         
Note that this guideline applies even to map and grep blocks that contain more than one statement. You should write:
            
    my @clean_words
        = map {
              my $word = $_;
              $word =~ s/$EXPLETIVE/[DELETED]/gxms;
              $word;
          } @raw_words;
         
not:
    my @clean_words
        = map { my $word = $_; $word =~ s/$EXPLETIVE/[DELETED]/gxms; $word } @raw_words;
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chunking
Code in paragraphs.
A paragraph is a collection of statements that accomplish a single task: in literature, it's a series of sentences conveying a single idea; in programming, it's a series of instructions implementing a single step of an algorithm.
Break each piece of code into sequences that achieve a single task, placing a single empty line between each sequence. To further improve the maintainability of the code, place a one-line comment at the start of each such paragraph, describing what the sequence of statements does. Like so:
            
               
                  # Process an array that has been recognized...
               
    sub addarray_internal {
        my ($var_name, $needs_quotemeta) = @_;

        
                  # Cache the original...
               
        $raw .= $var_name;

        
                  # Build meta-quoting code, if requested...
               
        my $quotemeta = $needs_quotemeta ?  q{map {quotemeta $_} } : $EMPTY_STR;

        
                  # Expand elements of variable, conjoin with ORs...
               
        my $perl5pat = qq{(??{join q{|}, $quotemeta \@{$var_name}})};

        
                  # Insert debugging code if requested...
               
        my $type = $quotemeta ? 'literal' : 'pattern';
        debug_now("Adding $var_name (as $type)");
        add_debug_mesg("Trying $var_name (as $type)");

        return $perl5pat;
    }
         
Paragraphs are useful because humans can focus on only a few pieces of information at once. Paragraphs are one way of aggregating small amounts of related information, so that the resulting "chunk" can fit into a single slot of the reader's limited short-term memory. Paragraphs enable the physical structure of a piece of writing to reflect and emphasize its logical structure. Adding comments at the start of each paragraph further enhances the chunking by explicitly summarizing the purpose of each chunk.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Elses
Don't cuddle an else .
A "cuddled" else looks like this:
    } else {
An uncuddled else looks like this:
            
    }
    else {
         
Cuddling saves an additional line per alternative, but ultimately it works against the readability of code in other ways, especially when that code is formatted using K&R bracketing. A cuddled else keyword is no longer in vertical alignment with its controlling if, nor with its own closing bracket. This misalignment makes it harder to visually match up the various components of an if-else construct.
More importantly, the whole point of an else is to distinguish an alternate course of action. But cuddling the else makes that distinction less distinct. For a start, it removes the near-empty line provided by the closing brace of the preceding if, which reduces the visual gap between the if and else blocks. Squashing the two blocks together in that way undermines the paragraphing inside the two blocks (see the previous guideline, "Chunking"), especially if the contents of the blocks are themselves properly paragraphed with empty lines between chunks.
Cuddling also moves the else from the leftmost position on its line, which means that the keyword is harder to locate when you are scanning down the code. On the other hand, an uncuddled else improves both the vertical separation of your code and the identifiability of the keyword:
            
    if ($sigil eq '$') {
        if ($subsigil eq '?') {
            $sym_table{ substr($var_name,2) } = delete $sym_table{$var_name};

            $internal_count++;
            $has_internal{$var_name}++;
        }
        else {
            ${$var_ref} = q{$sym_table{$var_name}};

            $external_count++;
            $has_external{$var_name}++;
        }
    }
    elsif ($sigil eq '@' && $subsigil eq '?') {
        @{ $sym_table{$var_name} }
            = grep {defined $_} @{$sym_table{$var_name}};
    }
    elsif ($sigil eq '%' && $subsigil eq '?') {
        delete $sym_table{$var_name}{$EMPTY_STR};
    }
    else {
        ${$var_ref} = q{$sym_table{$var_name}};
    }
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Vertical Alignment
Align corresponding items vertically.
Tables are another familiar means of chunking related information, and of using physical layout to indicate logical relationships. When setting out code, it's often useful to align data in a table-like series of columns. Consistent indentation can suggest equivalences in structure, usage, or purpose.
For example, initializers for non-scalar variables are often much more readable when laid out neatly using extra whitespace. The following array and hash initializations are very readable in tabular layout:
            
    my @months = qw(
        January   February   March
        April     May        June
        July      August     September
        October   November   December
    );

    my %expansion_of = (
        q{it's}    => q{it is},
        q{we're}   => q{we are},
        q{didn't}  => q{did not},
        q{must've} => q{must have},
        q{I'll}    => q{I will},
    );
         
Compressing them into lists saves lines, but also significantly reduces their readability:
    my @months = qw(
        January February March April May June July August September
        October November December
    );

    my %expansion_of = (
        q{it's} => q{it is}, q{we're} => q{we are}, q{didn't} => q{did not},
        q{must've} => q{must have}, q{I'll} => q{I will},
    );
Take a similar tabular approach with sequences of assignments to related variables, by aligning the assignment operators:
            
    $name   = standardize_name($name);
    $age    = time - $birth_date;
    $status = 'active';
         
rather than:
    $name = standardize_name($name);
    $age = time - $birth_date;
    $status = 'active';
Alignment is even more important when assigning to a hash entry or an array element. In such cases, the keys (or indices) should be aligned in a column, with the surrounding braces (or square brackets) also aligned. That is:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Breaking Long Lines
Break long expressions before an operator.
When an expression at the end of a statement gets too long, it's common practice to break that expression after an operator and then continue the expression on the following line, indenting it one level. Like so:
    push @steps, $steps[-1] +
        $radial_velocity * $elapsed_time +
        $orbital_velocity * ($phase + $phase_shift) -
        $DRAG_COEFF * $altitude;
The rationale is that the operator that remains at the end of the line acts like a continuation marker, indicating that the expression continues on the following line.
Using the operator as a continuation marker seems like an excellent idea, but there's a serious problem with it: people rarely look at the right edge of code. Most of the semantic hints in a program—such as keywords—appear on the left side of that code. More importantly, the structural cues for understanding code—for example, indenting—are predominantly on the left as well (see the upcoming "Keep Left" sidebar). This means that indenting the continued lines of the expression actually gives a false impression of the underlying structure, a misperception that the eye must travel all the way to the right margin to correct.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Non-Terminal Expressions
Factor out long expressions in the middle of statements.
The previous guideline applies only if the long expression to be broken is the last value in a statement. If the expression appears in the middle of a statement, it is better to factor that expression out into a separate variable assignment. For example:
            
    my $next_step = $steps[-1]
                    + $radial_velocity * $elapsed_time
                    + $orbital_velocity * ($phase + $phase_shift)
                    - $DRAG_COEFF * $altitude
                    ;
    add_step( \@steps, $next_step, $elapsed_time);
         
rather than:
    add_step( \@steps, $steps[-1]
                       + $radial_velocity * $elapsed_time
                       + $orbital_velocity * ($phase + $phase_shift)
                       - $DRAG_COEFF * $altitude
                       , $elapsed_time);
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Breaking by Precedence
Always break a long expression at the operator of the lowest possible precedence.
As the examples in the previous two guidelines show, when breaking an expression across several lines, each line should be broken before a low-precedence operator. Breaking at operators of higher precedence encourages the unwary reader to misunderstand the computation that the expression performs. For example, the following layout might surreptitiously suggest that the additions and subtractions happen before the multiplications:
    push @steps, $steps[-1] + $radial_velocity
                 * $elapsed_time + $orbital_velocity
                 * ($phase + $phase_shift) - $DRAG_COEFF
                 * $altitude
                 ;
If you're forced to break on an operator of less-than-minimal precedence, indent the broken line one additional level relative to the start of the expression, like so:
            
    push @steps, $steps[-1]
                 + $radial_velocity * $elapsed_time
                 + $orbital_velocity
                     * ($phase + $phase_shift)
                 - $DRAG_COEFF * $altitude
                 ;
         
This strategy has the effect of keeping the subexpressions of the higher precedence operation visually "together".
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Assignments
Break long assignments before the assignment operator.
Often, the long statement that needs to be broken will be an assignment. The preceding rule does work in such cases, but leads to code that's unaesthetic and hard to read:
    $predicted_val = $average
                     + $predicted_change * $fudge_factor
                     ;
A better approach when breaking assignment statements is to break before the assignment operator itself, leaving only the variable being assigned to on the first line. Then indent one level, and place the assignment operator at the start of the next line—once again indicating a continued statement:
            
    $predicted_val
        = $average + $predicted_change * $fudge_factor;
         
Note that this approach often allows the entire righthand side of an assignment to be laid out on a single line, as in the preceding example. However, if the righthand expression is still too long, break it again at a low-precedence operator, as suggested in the previous guideline:
            
    $predicted_val
        = ($minimum + $maximum) / 2
          + $predicted_change * max($fudge_factor, $local_epsilon);
         
A commonly used alternative layout for broken assignments is to break after the assignment operator, like so:
    $predicted_val =
        $average + $predicted_change * $fudge_factor;
This approach suffers from the same difficulty described earlier: it's impossible to detect the line continuation without scanning all the way to the right of the code, and the "unmarked" indentation of the second line can mislead the casual reader. This problem of readability is most noticeable when the variable being assigned to is itself quite long:
    $predicted_val{$current_data_set}[$next_iteration] =
        $average + $predicted_change * $fudge_factor;
which, of course, is precisely when such an assignment would most likely need to be broken. Breaking before the assignment operator makes long assignments much easier to identify, by keeping the assignment operator visually close to the start of the variable being assigned to:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Ternaries
Format cascaded ternary operators in columns.
One operator that is particularly prone to creating long expressions is the ternary operator. Because the ? and : of a ternary have very low precedence, a straightforward interpretation of the expression-breaking rule doesn't work well in this particular case, since it produces something like:
    my $salute = $name eq $EMPTY_STR ? 'Customer'
                 : $name =~ m/\A((?:Sir|Dame) \s+ \S+)/xms ? $1
                 : $name =~ m/(.*), \s+ Ph[.]?D \z/xms ? "Dr $1" : $name;
which is almost unreadable.
The best way to lay out a series of ternary selections is in two columns, like so:
            
               
                  # When their name is...                    Address them as...
               
    my $salute = $name eq $EMPTY_STR                      ? 'Customer'
               : $name =~ m/\A((?:Sir|Dame) \s+ \S+) /xms ? $1
               : $name =~ m/(.*), \s+ Ph[.]?D \z     /xms ? "Dr $1"
               :                                            $name
               ;
         
In other words, break a series of ternary operators before every colon, aligning the colons with the operator preceding the first conditional. Doing so will cause the conditional tests to form a column. Then align the question marks of the ternaries so that the various possible results of the ternary also form a column. Finally, indent the last result (which has no preceding question mark) so that it too lines up in the results column.
This special layout converts the typical impenetrably obscure ternary sequence into a simple look-up table: for a given condition in column one, use the corresponding result from column two.
You can use the tabular layout even if you have only a single ternary:
            
    my $name = defined $customer{name} ? $customer{name}
             :                           'Sir or Madam'
             ;
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Lists
Parenthesize long lists .
The comma operator is really an operator only in scalar contexts. In lists, the comma is an item separator. Consequently, commas in multiline lists are best treated as item terminators. Moreover, multiline lists are particularly easy to confuse with a series of statements, as there is very little visual difference between a , and a ;.
Given the potential for confusion, it's important to clearly mark a multiline list as being a list. So, if you need to break a list across multiple lines, place the entire list in parentheses. The presence of an opening parenthesis highlights the fact that the subsequent expressions form a list, and the closing parenthesis makes it immediately apparent that the list is complete.e
When laying out a statement containing