BUY THIS BOOK
Add to Cart

Print Book $39.95


Safari Books Online

What is this?

Add to UK Cart

Print Book £28.50

What is this?

Looking to Reprint this content?

Computer Science & Perl Programming
Computer Science & Perl Programming Best of The Perl Journal

Edited by Jon燨rwant
Price: $39.95 USD
£28.50 GBP

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction
Jon Orwant
"Perl is a language for getting your job done," begins Programming Perl. As programming languages go, Perl is something of a grab bag, and so is this book.
In this introduction I'll tell you how the book came to be, first by talking about the history of TPJ, and then about why computer science and Perl programming are a natural combination.
In 1995, I was angry. Perl had broken away from being stereotyped as a system administration langauge or text processing language, and had managed to claw itself up to merely being stereotyped as a web programming language. I had seen Perl used for AI, astronomy, biology, graphics, natural language processing, and other areas鈥攂ut Perl's generality wasn't being communicated to the programming world. Perl wasn't getting the reputation it deserved.
So when Tom Christiansen floated the notion of a Perl newsletter on the perl5-porters mailing list, it seemed like a natural idea. I'd just seen my first Perl book printed with my ampersands translated into eights, my vertical bars translated into ones, and my bullet marks depicted as planets complete with rings. (As you might guess, the publisher wasn't O'Reilly.) I wanted to do Perl publishing right, and at the same time show the world that Perl wasn't just for system administration any more. And so I set to work with my NeXT workstation and a copy of Framemaker. I found a Boston area printer via the Yellow Pages, and hit up the Perl gurus for articles. I announced the magazine on Usenet, and that was the extent of my marketing.
The reception was mostly enthusiastic, although there was some initial skepticism: people said I was crazy to attempt print rather than web publication. But print has a portability and resolution unrivalled by computer displays, and professional printing provides a sense of permanence that web sites can't match. Paper affords a control over the graphical layout that is hard to achieve in a browser (even with Cascading Style Sheets). For instance, in my TPJ article on Data Hiding, I hid a message in the spacing between letters, and screened in a faint watermark on the page. And I had a hidden message perpetrated on me in the cover of TPJ #3, where photographer Alan Blount hid "perl sux" in his cover photograph.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
History of TPJ
In 1995, I was angry. Perl had broken away from being stereotyped as a system administration langauge or text processing language, and had managed to claw itself up to merely being stereotyped as a web programming language. I had seen Perl used for AI, astronomy, biology, graphics, natural language processing, and other areas鈥攂ut Perl's generality wasn't being communicated to the programming world. Perl wasn't getting the reputation it deserved.
So when Tom Christiansen floated the notion of a Perl newsletter on the perl5-porters mailing list, it seemed like a natural idea. I'd just seen my first Perl book printed with my ampersands translated into eights, my vertical bars translated into ones, and my bullet marks depicted as planets complete with rings. (As you might guess, the publisher wasn't O'Reilly.) I wanted to do Perl publishing right, and at the same time show the world that Perl wasn't just for system administration any more. And so I set to work with my NeXT workstation and a copy of Framemaker. I found a Boston area printer via the Yellow Pages, and hit up the Perl gurus for articles. I announced the magazine on Usenet, and that was the extent of my marketing.
The reception was mostly enthusiastic, although there was some initial skepticism: people said I was crazy to attempt print rather than web publication. But print has a portability and resolution unrivalled by computer displays, and professional printing provides a sense of permanence that web sites can't match. Paper affords a control over the graphical layout that is hard to achieve in a browser (even with Cascading Style Sheets). For instance, in my TPJ article on Data Hiding, I hid a message in the spacing between letters, and screened in a faint watermark on the page. And I had a hidden message perpetrated on me in the cover of TPJ #3, where photographer Alan Blount hid "perl sux" in his cover photograph.
I also knew that it would be too easy to let quality slip with a web magazine. The high cost of printing gives each issue a stamp of finality; in contrast, a mistake on a web page could always be fixed later. Masochistic as it sounds, I wanted the deadlines that ink-on-dead-trees printing imposed. And print has more prestige: I wanted Perl to get the respect it deserved, and that meant people finding the magazine in their local bookstore.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Computer Science and Perl Programming
When you pursue a computer science degree, you learn about not just computers but computability; not just how to program, but strategies for solving problems and expressing those solutions as algorithms. But what you don't often learn is "computer science in the wild"鈥攈ow the lofty abstractions, generalizations, and precepts are implemented in the real world.
Perl is very much a real world language. It's been taught in middle schools all the way up through graduate programs, but it's not the best first language for computer science students, partly because it does so much for you, and partly because it's so expressive that it allows you to program badly. This is exactly what you want if you need to dash off a one-liner to generate a report from the company database in the next minute, but it's not desirable in a computer science curriculum where purity is valued over expedience.
If you were taking a class on compilers, you'd learn about how programs are turned from source code into binaries. Typically, this is expressed in several phases: lexical analysis, syntax analysis, semantic analysis, code generation, and optimization. And in that class, you'd write a simple compiler for a toy language, perhaps taking a couple of weeks to implement each of these phases. Very clean.
Now consider how Perl parses programs, as described in the article Lexical Analysis. Perl's semantic analysis affects its lexical analysis, so they occur at the same time. Unclean.
The programming component of my undergraduate computer science education primarily used Scheme, a dialect of LISP. It's as clean as a language can be, with a mathematical simplicity and elegance. I believe that every freshman should study LISP, and I recommend my undergraduate text: Structure and Interpretation of Computer Programs (MIT Press). Scheme is the perfect instructional language because its syntax is minimal.
Perl, it might be said, has maximal syntax. A few keystrokes can do a lot. One of the notions of Huffman coding (discussed in Compression) is that frequently occurring things should be represented more concisely than infrequently occuring things; that's why an E in Morse code is a single dot while a Z is dash dash dot dot, and that's why the function to search and replace strings in Perl is an
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: All About Arrays
Nathan Torkington
Arrays are one of Perl's three primary data types (the other two are scalars and hashes). This article will help you understand everything that can be done with them.
All array variables begin with an @ sign. They hold a list of scalar values (such as a string or number) whose positions are numbered beginning from 0. So in this code, blue is in position number 2 of the @colors array, and 42 is in position 3 of the @data array:
	@colors = ("red", "green", "blue");
	@data = ("Perl", 2_000_000, "Wall", 42);
At this early point it's good to start distinguishing lists from arrays. Perl gurus try to be precise about this distinction when they talk about their code: both are sequences of scalars, but while arrays are true stored variables, lists are merely temporary sequences of values. Subroutines accept lists, and can return them; as you pass an array into a subroutine, it becomes a list of values. Likewise, when a subroutine returns a list, you can store it in an array.
You store a list inside an array variable if you want to access the list's values later. Subroutines and functions don't, strictly speaking, accept arrays, except for a few special functions that we'll see later. Where Perl expects a bunch of values to work on, those values can come from a list, whether it's hardcoded in the program, returned by a function, or extracted from an array.
Inside double-quoted strings, arrays interpolate (expand) into their values, separated by spaces:
	print "Primary colors are: @colors\n";
	red green blue
Spaces are the default separator, but you can change this with the $" variable:
	$" = ' and '; 
	print "Primary colors are: @colors\n";
	red and green and blue
To access a single value from an array, use square brackets:
	$colors[2]
The name of the array is "colors", the $ in front indicates a scalar value, and the position of that value, called a subscript, is in the square brackets. This notation works for both storing and fetching values.
	$colors[0] = "pink";
	print $colors[0];
Array subscripts also interpolate inside double-quoted strings:
	print "The 0th color is $colors[0]\n";
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Basics
All array variables begin with an @ sign. They hold a list of scalar values (such as a string or number) whose positions are numbered beginning from 0. So in this code, blue is in position number 2 of the @colors array, and 42 is in position 3 of the @data array:
	@colors = ("red", "green", "blue");
	@data = ("Perl", 2_000_000, "Wall", 42);
At this early point it's good to start distinguishing lists from arrays. Perl gurus try to be precise about this distinction when they talk about their code: both are sequences of scalars, but while arrays are true stored variables, lists are merely temporary sequences of values. Subroutines accept lists, and can return them; as you pass an array into a subroutine, it becomes a list of values. Likewise, when a subroutine returns a list, you can store it in an array.
You store a list inside an array variable if you want to access the list's values later. Subroutines and functions don't, strictly speaking, accept arrays, except for a few special functions that we'll see later. Where Perl expects a bunch of values to work on, those values can come from a list, whether it's hardcoded in the program, returned by a function, or extracted from an array.
Inside double-quoted strings, arrays interpolate (expand) into their values, separated by spaces:
	print "Primary colors are: @colors\n";
	red green blue
Spaces are the default separator, but you can change this with the $" variable:
	$" = ' and '; 
	print "Primary colors are: @colors\n";
	red and green and blue
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Positions
To access a single value from an array, use square brackets:
	$colors[2]
The name of the array is "colors", the $ in front indicates a scalar value, and the position of that value, called a subscript, is in the square brackets. This notation works for both storing and fetching values.
	$colors[0] = "pink";
	print $colors[0];
Array subscripts also interpolate inside double-quoted strings:
	print "The 0th color is $colors[0]\n";
To make life easy for programmers, who often need to refer to both ends of the array conveniently, a negative subscript counts back from the end of the array:
	print $colors[-1];
	blue

	print $colors[-3];
	pink
An attempt to fetch a nonexistent negative position returns undef, but an attempt to store in such a position is a fatal error:
	print $colors[-4];
	Use of uninitialized value ...

	$colors[-4] = "ultraviolent";
	Modification of non-creatable array value attempted,
	  subscript -4 at ...
Perl has dynamic data structures that grow as needed. They only grow when assigned to, though, and never simply by reading. So if you attempt to access an element beyond the end of the array, you'll get undef鈥攁nd the array's size won't change as a result.
To determine the size of an array, you can evaluate it in scalar context by assigning it to a scalar:
	$size_before = @colors;
	print $colors[5];
	$size_after = @colors;
	print "$size_before $size_after\n";
	Use of uninitialized value at ...
	3 3

	$size_before = @colors;
	$colors[5] = "burgundy";
	$size_after = @colors;
	print "$size_before $size_after\n";
	Use of uninitialized value at ...
	3 6
When you assigned to position five, Perl created values in positions three and four as well. Now you have six elements in the array, in positions zero through five.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Position Versus Count
Welcome to the torture of counting array positions. Because positions start at zero, the size and last position always differ by one. If the only value in the array is at position zero, then there is one element. If there are two elements, they must be in positions zero and one.
Each array has an accompanying scalar variable containing the last position of the array. That variable is $#, followed by the array name (no @ sign, since it's a scalar we're after):
	print $#colors;                     # last position
	5

	print scalar(@colors);             # number of elements
	6
This often confuses beginners when they use loops to count over the positions of an array. There are two right ways to do it:
	for ($i=0; $i <   @colors; $i++) { ... }             # A
	for ($i=0; $i <= $#colors; $i++) { ... }             # B
And two wrong ways:
	for ($i=0; $i <= @colors; $i++) { ... }              # C
	for ($i=0; $i <  $#colors; $i++) { ... }             # D
Option C executes the loop body for one too many positions (if there are six things in @colors, the loop executes when $i is six, even though that's not a valid position). Likewise, option D executes the body one too few times (if the last position is five, the loop stops after executing the loop with $i set to four). I prefer option A because it takes fewer keystrokes than option B.
The $#array variable has another use: you can set it, which pre-extends the array. If you know your array will eventually have a thousand elements in it, you can tell Perl to allocate all the elements at once rather than making Perl allocate the thousand items incrementally as you grow the array.
	$#numbers = 999;
	for ($i=0; $i < 1000; $i++) {
	    $numbers[$i] = 5 * $i + 1;
	}
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Foreach Loops
Many times, you won't need the position of the current element; you'll only need its value. Rather than use a C-style for loop as above, use a Perl-style foreach loop:
	@colors = ("red", "green", "blue");
	foreach $c (@colors) {
	   print "$c\n";
	}
	red
	green
	blue
You may choose any loop variable (the $c above) that you wish. If you follow tight programming discipline and used the strict pragma to prevent accidental use of global variables, you can mix my or local with the foreach:
	#!/usr/bin/perl -w

	use strict;

	my @colors = ("red", "green", "blue");
	foreach my $c (@colors) {
	    print "$c\n"; 
	}
	red
	green
	blue
Inside foreach loops, the loop variable is actually an alias for the value in the list. So if you change the loop variable, you change the element in the list:
	@colors = ("red", "brown");
	foreach $c (@colors) {
	    $c = "hot $c";
	} print "@colors\n";

	hot red hot brown
If you omit the variable, Perl will use $_ as the default variable:
	foreach (@colors) {
	    print "Current item is $_\n";
	}
This is useful when you combine it with the string functions that use $_ as their default values:
	foreach (@colors) {
	    tr/A-Z/a-z/;
	    s/pink|burgundy/red/i;
	    print length, "\n";
	}
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Reverse and Sort Functions
What else can you do with arrays? You can reverse the order of the elements:
	@inverted = reverse @colors;
	print "@inverted\n";
	blue green red
You can sort the elements in ASCIIbetical order:
	@colors = ("pink", "purple", "mauve");
	@ordered = sort @colors;
	print "@ordered\n";
	mauve pink purple
What if you prefer reverse alphabetical order? You might write this:
	@ordered = sort @colors;
	@inverted = reverse @ordered;
	print "@inverted\n";
	purple pink mauve
This works, but you can be even more concise. Like many functions, reverse and sort take any list of values as arguments:
	@inverted = reverse sort @colors;
Can you see why the following won't work?
	@inverted = sort reverse @colors;    # WRONG
The answer is at the end of the chapter.
Even when you combine sort and reverse in the right order, it's rather inefficient. sort returns a temporary list of values, which is then reversed. It'd be more efficient to tell sort to sort in the order you want. You can do that!
sort accepts a code block before the list of values to sort. The code block tells sort how to order any two values. Those values are put into the global variables $a and $b before the code block is executed. (Most code blocks use Perl's <=> or cmp operators to compare things numerically or ASCIIbetically.)
The default comparison routine is:
	$a cmp $b
cmp compares values as strings, and by putting $a before $b, you get an ascending sort. If you wanted to sort from highest to lowest, it's as simple as flipping the order of $a and $b in the comparison: instead of telling sort that "green" should come after "blue", it'll now say that "green" should come before "blue":
	@colors = ("pink", "purple", "mauve");
	@inverted_ordered = sort { $b cmp $a } @colors;
	print "@inverted_ordered\n";
	purple pink mauve
There are many more complicated sorts, up to and including the Schwartzian Transform. But I digress. If you want more information on sorting, consult a good Perl book like The Perl Cookbook by Tom Christiansen and yours truly (O'Reilly & Associates), or Effective Perl Programming by Joseph Hall (Addison-Wesley).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Slices
You now know how to talk about the array as a whole, and how to talk about single values from the array, but what about subsets of the array? For that you need to know about array slices:
	@subset = @colors[0,2];
	print "@subset\n";
	pink mauve
The @ sign at the beginning indicates that you want multiple values back. Inside the square brackets is a list of values. In this case, it's just positions zero and two you want, but you can have any list you like:
	($x, $y, $z) = @big_array[5, 2, 100];
That's like saying this, except that your fingers don't get worn out:
	$x = $big_array[5];
	$y = $big_array[2];
	$z = $big_array[100];
When you want a range of values (e.g., from positions two through eight) you can use the range (..) operator:
	@subset = @big_array[2..8];
Which, again, is like typing this, but without fingerprint damage:
	@subset = @big_array[2, 3, 4, 5, 6, 7, 8];
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Adding and Deleting Values
Perl has five functions for inserting and removing values from an array. Four of those functions are quite specialized, working with only the start or end of the array. The last, splice, is far more general. Let's cover the specialized functions first.
push and pop act on the end of the array. push adds values to the end of the array; pop removes the last value and returns it:
	@characters = ("Buffy", "Willow", "Xander");
	push(@characters, "Giles", "Anya");
	print "@characters\n";
	$ex_demon = pop @characters;
	print "popped $ex_demon\n";
	print "@characters\n";
	Buffy Willow Xander Giles Anya
	popped Anya
	Buffy Willow Xander Giles
The corresponding functions that work on the start of the array are shift and unshift:
	@baddies = ("Spike", "Mayor", "Adam");
	$in_wuv = shift @baddies;
	print "removed $in_wuv\n";
	print "left: @baddies\n";
	unshift @baddies, "Dracula";
	print "@baddies\n";
	removed: Spike
	left: Mayor Adam
	Dracula Mayor Adam
If you shift or pop but don't give an array name, Perl assumes you mean the current arguments. If you're in a subroutine definition, the array that's operated on is @_, containing the subroutine arguments. If you're not in a subroutine definition, @ARGV is shifted or popped.
The uber function for arrays is splice, which lets you perform any combination of inserting, deleting, or replacing. You give it an array to work on, the position at which to begin deleting elements, the number of elements to delete, and any elements to insert in place of those deleted. splice returns the deleted elements, if any:
	@gals = ("Buffy", "Willow", "Anya", "Faith");
	@cut = splice @gals, 1, 2, "Tara";
	print "@gals\n";
	print "@cut\n";
	Buffy Tara Faith
	Willow Anya
The two things starting at position one were "Willow" and "Anya". In their place was put "Tara".
You can delete zero elements, and use splice only for its ability to insert:
	@gals = ("Buffy", "Willow", "Anya");
	splice @gals, 2, 0, "Tara";
	print "@gals\n";
	Buffy Willow Tara Anya
You can insert no elements, and only use splice for its ability to delete:
	@gals = ("Buffy", "Cordelia", "Faith", "Willow", "Anya"); 
	@cut = splice @gals, 1, 2; 
	print "@gals\n"; 
	print "@cut\n";
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Lists to Strings and Back Again
How do you create a list? You can hardcode it in your program or accumulate it element by element with push or unshift. Often you just read the list from a file.
Imagine a list of words on one line:
	Buffy The Vampire Slayer
You would like an array with each element as a single word. You could do this with repeated matches:
	while ($string =~ m/(\S+)/g) {
	    push @words, $1;
	}
But the easiest way is to use the split function, which takes up to three arguments. The first is a regular expression matching the stuff between the values you want. Here, you'll need a regular expression matching spaces. The second argument to split is the string to be split up. The third and final argument is the number of fields you want back, but if you omit it you'll get all the fields.
	@words = split /\s+/, $string;
If you omit the second argument, split looks in $_ for the string. This makes it perfect for these kinds of loops:
	while (<SOMEFILE>) {
	    @words = split /\s+/;
	    #...
	}
In fact, if you have your string in $_ and you want it split on whitespace, you don't even need the regular expression鈥攖he default regular expression is whitespace!
	while (<SOMEFILE>) {
	  @words = split;
	  # ...
	}
Of course, your strings don't always have fields separated by spaces. The Unix password file, for instance, separates fields with colons:
	while (<PASSWDFILE>) {
	    @fields = split /:/;
	    # ...
	}
split has some quirks: it ignores any trailing empty fields, so if your colon-separated record was big:deal:::, you'd get two fields back: big and deal. This is sometimes what you want, but not always.
The opposite of split is join. split extracts fields that have been separated. join produces a string of separated fields. The first argument is the separator (an exact string, not a regular expression), and the rest of the arguments are values to join together with the separator in between each pair. For instance:
	@adjectives = ("hot", "damp", "sticky");
	$line = join(" and ", @adjectives);
	print $line;
	hot and damp and sticky
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Putting It All Together
So here's how you reverse the order of words for each line in a file:
	while (<INFILE>) {
	    @fields = split;
	    @new = reverse @fields;
	    $line = join " ", @new;
	    print OUTFILE "$line\n";
	}
More concisely:
	while (<INFILE>) {
	    print OUTLINE join(" ", reverse split), "\n";
	}
Answer to the earlier question. The code said to reverse the list, then sort it. The call to reverse is useless, because sort sorts the list into ascending order.
In the next article, I'll take a step back and examine some of the common mistakes beginners make, and how you can avoid them.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Perfect Programming
Nathan Torkington
Imagine a world ten years from now. Programmers know everything there is to know about their language, algorithms, and requirements. They apply this knowledge to produce flawless programs, which work correctly the first and every time. Users read the manuals, never provide false or misleading input, and always know what to do next. Clients never change their minds and maintenance is unnecessary.
You can wake up now. We both know this won't happen so long as boneheads like us keep programming, morons like our customers keep giving us incomplete and perpetually changing requirements, and the prerequisite for being a user is that you demonstrate zero ability to read, think, or act without tech support or a programmer holding your hand. Everyone in the programmer-client-user world is a weak link, and programmers must be prepared for mistakes. There are three major classes of mistakes: user mistakes, client mistakes, and programmer mistakes.
User mistakes
When users are to blame, it's typically because they do something like providing incorrect input to your program, or calling your program in an unexpected way. Paranoid programmers check everything provided by the users (and use the taint mechanism to help them). This has the side benefit of making their programs more secure against exploitation by The Bad Guys. The Bad Guys like to mess with a program's environment, input, and configuration files, in the hope they can trick it into displaying /etc/master.passwd, or changing the permissions of /bin/sh to 4755, making it setuid.
Client mistakes
Customers are fickle. Sometimes they request minor changes ("We want to sort the addresses by zipcode"); sometimes the changes are major ("The CEO just bought an Oracle database. Use it."). Changes run the risk of breaking software that worked previously. The programmer must write code in such a way that substantial changes in behavior can be implemented with minimum risk.
Programmer mistakes
Finally, as unwilling as we all are to accept it, programmers make mistakes. They're typically things like using variables that don't yet have a value, giving incorrect values to a function, and creating language misunderstandings like
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Warnings with -w
This is the programmer's most useful debugging aid. As Larry says, Perl's biggest bug is that -w is optional. Some of the things that a hashbang line of #!/usr/bin/perl -w will catch are: use of undefined values (typically a sign that you're expecting a variable to have a value when it doesn't), nonnumeric arguments (a string was given instead of a number, which probably means it would be interpreted as 0 instead of being flagged as an error), = instead of ==, and much more.
Sometimes you want -w checks in some places but not others. If there's a chunk of code you just know will work even though -w complains about it, you can disable warnings as follows:
	{
	   local($^W) = 0;          # disable warnings...
	   your code here
	}                           # warnings back on now
This traps only runtime warnings. Disabling compile-time warnings is also possible; see the perllexwarn documentation for details.
There has been a vigorous debate on the subject of -w in production programs. New versions of Perl have created new warnings, which show up as "errors" (broken web pages, strange cron mailings, STDERR sent to users' screens) in programs that worked previously. Tracking these down can be a nontrivial task. I like to keep my code -w clean for all versions, because it makes future changes easier to test with -w. Your mileage may vary.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The strict Pragma
If you're using references or trying to write maintainable or reusable code, you probably want to use strict. This is a shorthand for use strict 'refs', 'subs', 'vars', which catches the following things:
use strict 'refs'
Prevents suspicious dereferences. If a subroutine expects a hard reference to a value (the kind of reference you get with \), but you supply it the wrong arguments or the right arguments in the wrong order, you can cause a string or a number to be inadvertently dereferenced. Consider this code:
	sub setref {
	    my $string_ref = shift;
	    my $string = shift;
	    $$string_ref = $string;
	}

	setref("Googol", $plexref); # wrong argument order
Here, the setref subroutine is passed "Googol" where it expects a reference to a string. Without use strict 'refs', Perl assumes you meant $Googol. This is called a soft, or symbolic, reference. When you use that pragma, however, Perl whines and dies. Because soft references are almost never needed, use strict "refs" catches a lot of errors that would otherwise silently cause bizarre behavior.
use strict 'vars'
Catches stray variables. It expects you to either qualify every variable completely ($Package::Var) or to declare them with my. In almost every case, you really want to use my to scope your variable so that code outside the file or block can't perturb its value. Using my to predeclare all variables (or using cumbersome fully-qualified variable names) will predispose you to document your variables for the hapless fool who must modify your program in a year's time. Don't laugh. It might be you.
	if ($core->active) {
	    my $rems;           # active radiation in rems
	    my $rod_volume;     # volume of carbon rod remaining
	    your code here
	}
use strict 'subs'
Forbids stray barewords. When it's in effect, you can't use the bareword style of calling subroutines with no arguments (e.g., $result = mysub;) unless the subroutine was declared before its use, either with a prototype or with the subroutine definition itself. If you don't want to predeclare, you must preface the subroutine call with & or append () so that it looks like a subroutine call. This doesn't affect the use of barewords in hashes in curly braces (e.g.,
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Tainting and Safe
When Perl encounters a variable with a value that hasn't been hardcoded into the program, it marks the variable as tainted if the program is running under the -T flag, or if the program's permissions are setuid (meaning that it assumes the identity of its owner rather than whoever is running the program). Use of a tainted value in exec or similar calls, or opening a filename for writing, causes a fatal error. To untaint data, you should extract the safe portion (for a filename, that might be /^([\w.\@-]+)$/) with a regular expression and use $1, $2, and similar variables to access the part of the tainted variable guaranteed to be safe. Full details can be found in the perlsec manual page.
Running with -T is almost always a good idea when you're programming defensively. It forces you to validate every piece of user-supplied data with regular expressions before you use them. Not only does this guard against potentially security-compromising errors, it also lets you catch situations where the user gives the wrong type of data (a string instead of a number, for instance).
A different approach is to use the Safe module, which traps certain operations. You can run code that uses untrustworthy data inside a Safe "compartment," knowing that it can't unlink files, fork processes, or do other nasty things.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Checking Return Values
Not every fork will succeed, not every file can be opened, not every child process terminates without error. The return values from system calls contain valuable information on the success or failure of those calls鈥攃heck them!
The most important things to check are return values of open, fork, exec, and the contents of $? (or $CHILD_ERROR if you use English).
The same wisdom applies to CPAN or library modules, and to your own modules. Your modules should perform sanity checks and return 0 or undef if something went wrong.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Planning for Failure
Part of catching errors is deciding what to do when they occur. Even before I begin programming, I enumerate the various ways my code can fail, and then decide what to do for each possibility. With some errors it's okay to tell the user exactly what went wrong ("You gave me the name of a user who isn't in the database"), but others shouldn't be made so public ("The database doesn't exist," or "I couldn't fork"). User errors typically warrant a message that pats their hand and gives them a chance to try again. System errors should be logged to a file, the administrators notified, and the user told that "The system is down," and they should try again later.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Perl Debugger
There is only so much that stack traces and strategically placed print statements can do. When you've located the problem, it can still be difficult to infer the cause. The next step is to write a small program that exhibits the bug and then steps through it with Perl's symbolic debugger (perl -d mysmallprogram). Of course, you can always invoke the debugger directly with perl -de 0 to initiate an interactive session.
Debugging will be most comfortable if you've installed the Term::ReadLine module, or if you use the Ilya Zakharevich's nice Emacs interface, cperl-mode.el. Even without these whizzy utilities, the debugger is still useful. You can step through your code and set breakpoints: locations in your program at which execution stops, giving you a chance to inspect or change variables, thus letting you discover the particular states that trigger the bug you're trying to fix. Consult the perldebug documentation for more information.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Perl Profiler
When your program works, but runs as slow as a dog, Dean Roehrich's Devel::DProf module (available on the CPAN) will help you determine why. perl -d:DProf myprogram runs your program and creates a file called tmon.out in your current working directory. You then run the dprofpp program to analyze that file and display the fifteen subroutines occupying the most time.
There are other features of the profiler (see the dprofpp documentation for more information) but this list of the most time-consuming subroutines is probably the most important. It pinpoints the parts of your program that use the most time, and hence are most suited for optimizing, rewriting, inlining, or avoiding.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Stack Traces
The terse little warnings and die messages that you're provided are often not sufficient when it comes to working out where things went wrong. For that you need the awesome power of Jack Shirazi's Devel::DumpStack. When I'm debugging a CGI script that refuses to play ball, I'll use this code, which traps warnings and fatal errors, displaying them in an HTML document instead of burying them in a web server error log:
	#!/usr/bin/perl -w

	use Devel::DumpStack qw(stack_as_string);
	use HTML::Entities;

	sub my_die {
	    select(STDOUT); $|=1;
	    printf(<<"EOF", $?, $!, stack_as_string( ));
	Content-Type: text/html

	<HTML><HEAD><TITLE>System Error</TITLE></HEAD>

	<BODY>
	<H1>System Error</H1>
	A seriously bad system error happened:<P>

	<B>Exit Status</B>: %d<BR>
	<B>Error String</B>: %s<P>

	<B>Stack Dump</B>:
	<PRE>
	%s
	</PRE>

	</BODY></HTML>
	EOF

	    exit;
	}

	BEGIN {
	  $SIG{__WARN__} = $SIG{__DIE_ _} = \&my_die;
	}

	$a = undef + 4;

	exit;
I wouldn't recommend leaving this code in your final product, however. The sight of a stack dump can mentally scar a user for life.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Precedence
Mark Jason Dominus
What's 2 + 3 x 4?
We learned about this in grade school; it was fourth-grade material in the New York City public school I attended. If not, that's okay too; I'll explain everything.
It's well-known that 2 + 3 x 4 is 14, because we are supposed to do the multiplication before the addition. 3 x 4 is 12, and then we add the 2 and get 14. What we do not do is perform the operations in left-to-right order; if we did that we would add 2 and 3 to get 5, then multiply by 4 and get 20.
This is just a convention about what an expression like 2 + 3 x 4 means. It's not an important mathematical fact; it's just a rule about how to interpret certain ambiguous arithmetic expressions. It could have gone the other way, or we could have the rule that the operations are always done left-to-right. But we don't have those rules; we have the rule that says that you do the multiplication first and then the addition. We say that multiplication takes precedence over addition.
What if we really do want to say: "Add 2 and 3, and multiply the result by 4"? Then we use parentheses, like this: (2 + 3) x 4. The rule about parentheses is that expressions in parentheses must always be fully evaluated before anything else.
If we always used the parentheses, we wouldn't need rules about precedence. There wouldn't be any ambiguous expressions. We have precedence rules because we're lazy and we like to leave out the parentheses when we can. The fully-parenthesized form is always unambiguous. The precedence rule tells us how to interpret a version with fewer parentheses to decide what it would look like if we wrote the equivalent fully-parenthesized version. In the example above:
  • 2 + (3 x 4)
  • (2 + 3 ) x 4
Is 2 + 3 x 4 like the first or like the second? The precedence rule just tells us that it is like the first.
In grade school we learned a few more rules:
4 x 52
Which of these interpretations is correct?
(4 x 5)2= 400
or
4 x (52) = 100
The rule is that exponentiation takes precedence over multiplication, so it's 100 and not 400.
What about 8 鈥 3+ 4? Is this like (8 鈥 3) + 4 = 9 or 8 鈥 (3+ 4) = 1? Here the rule is a little different. Neither + nor 鈥 has precedence over the other. Instead, the 鈥 and + are just done left-to-right. This rule handles the case of 8 鈥 4 鈥 3also. Is it (8 鈥 4) 鈥 3 = 1 or is it 8 鈥 (4 鈥 3) = 7? Subtractions are done left-to-right, so it's 1 and not 7. A similar left-to-right rule handles ties between x and /.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Is Precedence?
What's 2 + 3 x 4?
We learned about this in grade school; it was fourth-grade material in the New York City public school I attended. If not, that's okay too; I'll explain everything.
It's well-known that 2 + 3 x 4 is 14, because we are supposed to do the multiplication before the addition. 3 x 4 is 12, and then we add the 2 and get 14. What we do not do is perform the operations in left-to-right order; if we did that we would add 2 and 3 to get 5, then multiply by 4 and get 20.
This is just a convention about what an expression like 2 + 3 x 4 means. It's not an important mathematical fact; it's just a rule about how to interpret certain ambiguous arithmetic expressions. It could have gone the other way, or we could have the rule that the operations are always done left-to-right. But we don't have those rules; we have the rule that says that you do the multiplication first and then the addition. We say that multiplication takes precedence over addition.
What if we really do want to say: "Add 2 and 3, and multiply the result by 4"? Then we use parentheses, like this: (2 + 3) x 4. The rule about parentheses is that expressions in parentheses must always be fully evaluated before anything else.
If we always used the parentheses, we wouldn't need rules about precedence. There wouldn't be any ambiguous expressions. We have precedence rules because we're lazy and we like to leave out the parentheses when we can. The fully-parenthesized form is always unambiguous. The precedence rule tells us how to interpret a version with fewer parentheses to decide what it would look like if we wrote the equivalent fully-parenthesized version. In the example above:
  • 2 + (3 x 4)
  • (2 + 3 ) x 4
Is 2 + 3 x 4 like the first or like the second? The precedence rule just tells us that it is like the first.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Rules and More Rules
In grade school we learned a few more rules:
4 x 52
Which of these interpretations is correct?
(4 x 5)2= 400
or
4 x (52) = 100
The rule is that exponentiation takes precedence over multiplication, so it's 100 and not 400.
What about 8 鈥 3+ 4? Is this like (8 鈥 3) + 4 = 9 or 8 鈥 (3+ 4) = 1? Here the rule is a little different. Neither + nor 鈥 has precedence over the other. Instead, the 鈥 and + are just done left-to-right. This rule handles the case of 8 鈥 4 鈥 3also. Is it (8 鈥 4) 鈥 3 = 1 or is it 8 鈥 (4 鈥 3) = 7? Subtractions are done left-to-right, so it's 1 and not 7. A similar left-to-right rule handles ties between x and /.
Our rules are getting complicated now:
  1. Exponentiation first.
  2. Next multiplication and division, left to right.
  3. Then addition and subtraction, left to right.
Can we leave out the "left-to-right" part and just say that all ties will be broken left-to right? No, because for exponentiation that isn't true.
223
means
2(23), = 256, not (22)3 = 64.
So exponentiations are resolved from upper-right to lower-left. Perl uses the token ** to represent exponentiation, using x**y instead of xy. In this case x**y**z means x**(y**z), not (x**y)**z, so ** is resolved right-to-left.
Programming languages have the same notational problem, except it's even worse than in mathematics, partly because programmer's languages have so many different operator symbols. For example, Perl has at least 70 different operator symbols. This is a problem, because communication with the compiler and with other programmers must be unambiguous. We don't want to write something like 2 + 3 x 4 and have Perl compute 20 when we wanted 14, or vice versa.
Nobody knows a really good solution to this problem, and different languages solve it in different ways. For example, the language APL, which has a whole lot of unfamiliar operators like 蟻 and , dispenses with precedence entirely and resolves them all from right-to-left. The advantage of this is that we don't have to remember any rules, and the disadvantage is that many expressions are confusing: if we write 2 x 3 + 4, you get 14, not 10. In LISP the issue never comes up, because in LISP the parentheses are required, and so there are no ambiguous expressions. (Now you know why LISP looks the way it does.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
An Explosion of Rules
Let's see some examples of the reasons for which the precedence levels are set the way they are. Suppose we wrote something like this:
	$v = $x + 3;
This is actually ambiguous. It might mean:
	($v = $x) + 3;
or it might mean:
	$v = ($x + 3);
The first of these is silly, because it stores the value $x into $v, and then computes the value of $x + 3 and throws the result of the addition away. In this case, the addition was useless. The second one, however, makes sense, because it does the addition first and stores the result into $v. Since people write things like:
	$v = $x + 3;
all the time, and expect to get the second behavior and not the first, Perl's = operator has low precedence, lower than the precedence of +, so that Perl uses the second interpretation.
Here's another example:
	$result = $x =~ /foo/;
means this:
	$result = ($x =~ /foo/);
which looks to see if $x contains the string foo, and stores a true or false result into $result. It doesn't mean this:
	($result = $x) =~ /foo/;
which copies the value of $x into $result and then looks to see if $result contains foo. In this case it's likely that the programmer wanted the first meaning, not the second. But sometimes we do want it to go the other way. Consider this expression:
	$p = $q =~ s/w//g;
Again, this expression is interpreted this way:
	$p = ($q =~ s/w//g);
All the w's are removed from $q, and the number of successful substitutions is stored into $p. However, sometimes we really do want the other meaning:
	($p = $q) =~ s/w//g;
This copies the value of $q into $p, and then removes all the w's from $p, leaving $q alone. If we want this, we have to include the parentheses explicitly, because = has lower precedence than =~.
Often, the rules do what we want them to. Consider this:
	$worked = 1 + $s =~ /pattern/;
There are five ways to interpret this:
  1. ($worked = 1) + ($s =~ /pattern/);
  2. (($worked = 1) + $s) =~ /pattern/;
  3. ($worked = (1 + $s)) =~ /pattern/;
  4. $worked = ((1 + $s) =~ /pattern/);
  5. $worked = (1 + ($s =~ /pattern/));
We already know that + has higher precedence than =, so it happens before =, and that rules out (1) and (2)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Precedence Traps and Surprises
This very low precedence for commas causes some other problems, however. Consider the common idiom:
	open(F, "< $file") || die "Couldn't open $file: $!";
This tries to open a filehandle, and if it can't, it aborts the program with an error message. Now watch what happens if we leave the parentheses off the open call:
	open F, "< $file" || die "Couldn't open $file: $!";
The comma has very low precedence, so the || takes precedence here, and Perl interprets the expression as if we had written this:
	open F, ("< $file" || die "Couldn't open $file: $!");
This is totally bizarre, because the die will only be executed when the string "<$file" is false, which never happens. Since the die is controlled by the string and not by the open call, the program will not abort on errors the way we wanted. Here we wish that || had lower precedence, so that we could write:
	try to perform big long hairy complicated action    || die ;
and be sure that the || was not going to gobble up part of the action the way it did in our