Edited by Jon燨rwant
Price: $39.95 USD
£28.50 GBP
Cover | Table of Contents | Colophon
perl5-porters mailing list, it seemed like a natural idea. I'd just seen my first Perl book printed with my ampersands translated into eights, my vertical bars translated into ones, and my bullet marks depicted as planets complete with rings. (As you might guess, the publisher wasn't O'Reilly.) I wanted to do Perl publishing right, and at the same time show the world that Perl wasn't just for system administration any more. And so I set to work with my NeXT workstation and a copy of Framemaker. I found a Boston area printer via the Yellow Pages, and hit up the Perl gurus for articles. I announced the magazine on Usenet, and that was the extent of my marketing.perl5-porters mailing list, it seemed like a natural idea. I'd just seen my first Perl book printed with my ampersands translated into eights, my vertical bars translated into ones, and my bullet marks depicted as planets complete with rings. (As you might guess, the publisher wasn't O'Reilly.) I wanted to do Perl publishing right, and at the same time show the world that Perl wasn't just for system administration any more. And so I set to work with my NeXT workstation and a copy of Framemaker. I found a Boston area printer via the Yellow Pages, and hit up the Perl gurus for articles. I announced the magazine on Usenet, and that was the extent of my marketing.blue is in position number 2 of the @colors array, and 42 is in position 3 of the @data array: @colors = ("red", "green", "blue");
@data = ("Perl", 2_000_000, "Wall", 42); print "Primary colors are: @colors\n";
red green blue $" = ' and ';
print "Primary colors are: @colors\n";
red and green and blue $colors[2] $colors[0] = "pink";
print $colors[0]; print "The 0th color is $colors[0]\n";blue is in position number 2 of the @colors array, and 42 is in position 3 of the @data array: @colors = ("red", "green", "blue");
@data = ("Perl", 2_000_000, "Wall", 42); print "Primary colors are: @colors\n";
red green blue $" = ' and ';
print "Primary colors are: @colors\n";
red and green and blue $colors[2] $colors[0] = "pink";
print $colors[0]; print "The 0th color is $colors[0]\n";print $colors[-1]; blue print $colors[-3]; pink
undef, but an attempt to store in such a position is a fatal error:print $colors[-4]; Use of uninitialized value ... $colors[-4] = "ultraviolent"; Modification of non-creatable array value attempted, subscript -4 at ...
undef鈥攁nd the array's size won't change as a result.$size_before = @colors; print $colors[5]; $size_after = @colors; print "$size_before $size_after\n"; Use of uninitialized value at ... 3 3 $size_before = @colors; $colors[5] = "burgundy"; $size_after = @colors; print "$size_before $size_after\n"; Use of uninitialized value at ... 3 6
print $#colors; # last position 5 print scalar(@colors); # number of elements 6
for ($i=0; $i < @colors; $i++) { ... } # A for ($i=0; $i <= $#colors; $i++) { ... } # B
for ($i=0; $i <= @colors; $i++) { ... } # C for ($i=0; $i < $#colors; $i++) { ... } # D
@colors, the loop executes when $i is six, even though that's not a valid position). Likewise, option D executes the body one too few times (if the last position is five, the loop stops after executing the loop with $i set to four). I prefer option A because it takes fewer keystrokes than option B.$#array variable has another use: you can set it, which pre-extends the array. If you know your array will eventually have a thousand elements in it, you can tell Perl to allocate all the elements at once rather than making Perl allocate the thousand items incrementally as you grow the array. $#numbers = 999;
for ($i=0; $i < 1000; $i++) {
$numbers[$i] = 5 * $i + 1;
}for loop as above, use a Perl-style foreach loop: @colors = ("red", "green", "blue");
foreach $c (@colors) {
print "$c\n";
}
red
green
blue$c above) that you wish. If you follow tight programming discipline and used the strict pragma to prevent accidental use of global variables, you can mix my or local with the foreach: #!/usr/bin/perl -w
use strict;
my @colors = ("red", "green", "blue");
foreach my $c (@colors) {
print "$c\n";
}
red
green
blueforeach loops, the loop variable is actually an alias for the value in the list. So if you change the loop variable, you change the element in the list: @colors = ("red", "brown");
foreach $c (@colors) {
$c = "hot $c";
} print "@colors\n";
hot red hot brown foreach (@colors) {
print "Current item is $_\n";
} foreach (@colors) {
tr/A-Z/a-z/;
s/pink|burgundy/red/i;
print length, "\n";
} @inverted = reverse @colors;
print "@inverted\n";
blue green red @colors = ("pink", "purple", "mauve");
@ordered = sort @colors;
print "@ordered\n";
mauve pink purple @ordered = sort @colors;
@inverted = reverse @ordered;
print "@inverted\n";
purple pink mauvereverse and sort take any list of values as arguments: @inverted = reverse sort @colors; @inverted = sort reverse @colors; # WRONGsort and reverse in the right order, it's rather inefficient. sort returns a temporary list of values, which is then reversed. It'd be more efficient to tell sort to sort in the order you want. You can do that!sort accepts a code block before the list of values to sort. The code block tells sort how to order any two values. Those values are put into the global variables $a and $b before the code block is executed. (Most code blocks use Perl's <=> or cmp operators to compare things numerically or ASCIIbetically.)$a cmp $b
cmp compares values as strings, and by putting $a before $b, you get an ascending sort. If you wanted to sort from highest to lowest, it's as simple as flipping the order of $a and $b in the comparison: instead of telling sort that "green" should come after "blue", it'll now say that "green" should come before "blue": @colors = ("pink", "purple", "mauve");
@inverted_ordered = sort { $b cmp $a } @colors;
print "@inverted_ordered\n";
purple pink mauve @subset = @colors[0,2];
print "@subset\n";
pink mauve ($x, $y, $z) = @big_array[5, 2, 100]; $x = $big_array[5];
$y = $big_array[2];
$z = $big_array[100]; @subset = @big_array[2..8]; @subset = @big_array[2, 3, 4, 5, 6, 7, 8];splice, is far more general. Let's cover the specialized functions first.push and pop act on the end of the array. push adds values to the end of the array; pop removes the last value and returns it: @characters = ("Buffy", "Willow", "Xander");
push(@characters, "Giles", "Anya");
print "@characters\n";
$ex_demon = pop @characters;
print "popped $ex_demon\n";
print "@characters\n";
Buffy Willow Xander Giles Anya
popped Anya
Buffy Willow Xander Gilesshift and unshift: @baddies = ("Spike", "Mayor", "Adam");
$in_wuv = shift @baddies;
print "removed $in_wuv\n";
print "left: @baddies\n";
unshift @baddies, "Dracula";
print "@baddies\n";
removed: Spike
left: Mayor Adam
Dracula Mayor Adamshift or pop but don't give an array name, Perl assumes you mean the current arguments. If you're in a subroutine definition, the array that's operated on is @_, containing the subroutine arguments. If you're not in a subroutine definition, @ARGV is shifted or popped.splice, which lets you perform any combination of inserting, deleting, or replacing. You give it an array to work on, the position at which to begin deleting elements, the number of elements to delete, and any elements to insert in place of those deleted. splice returns the deleted elements, if any: @gals = ("Buffy", "Willow", "Anya", "Faith");
@cut = splice @gals, 1, 2, "Tara";
print "@gals\n";
print "@cut\n";
Buffy Tara Faith
Willow Anyasplice only for its ability to insert: @gals = ("Buffy", "Willow", "Anya");
splice @gals, 2, 0, "Tara";
print "@gals\n";
Buffy Willow Tara Anyasplice for its ability to delete: @gals = ("Buffy", "Cordelia", "Faith", "Willow", "Anya");
@cut = splice @gals, 1, 2;
print "@gals\n";
print "@cut\n";push or unshift. Often you just read the list from a file.Buffy The Vampire Slayer
while ($string =~ m/(\S+)/g) {
push @words, $1;
}split function, which takes up to three arguments. The first is a regular expression matching the stuff between the values you want. Here, you'll need a regular expression matching spaces. The second argument to split is the string to be split up. The third and final argument is the number of fields you want back, but if you omit it you'll get all the fields.@words = split /\s+/, $string;
split looks in $_ for the string. This makes it perfect for these kinds of loops: while (<SOMEFILE>) {
@words = split /\s+/;
#...
} while (<SOMEFILE>) {
@words = split;
# ...
} while (<PASSWDFILE>) {
@fields = split /:/;
# ...
}split has some quirks: it ignores any trailing empty fields, so if your colon-separated record was big:deal:::, you'd get two fields back: big and deal. This is sometimes what you want, but not always.split is join. split extracts fields that have been separated. join produces a string of separated fields. The first argument is the separator (an exact string, not a regular expression), and the rest of the arguments are values to join together with the separator in between each pair. For instance: @adjectives = ("hot", "damp", "sticky");
$line = join(" and ", @adjectives);
print $line;
hot and damp and sticky while (<INFILE>) {
@fields = split;
@new = reverse @fields;
$line = join " ", @new;
print OUTFILE "$line\n";
} while (<INFILE>) {
print OUTLINE join(" ", reverse split), "\n";
}reverse is useless, because sort sorts the list into ascending order./bin/sh to 4755, making it setuid.-w is optional. Some of the things that a hashbang line of #!/usr/bin/perl -w will catch are: use of undefined values (typically a sign that you're expecting a variable to have a value when it doesn't), nonnumeric arguments (a string was given instead of a number, which probably means it would be interpreted as 0 instead of being flagged as an error), = instead of ==, and much more.-w checks in some places but not others. If there's a chunk of code you just know will work even though -w complains about it, you can disable warnings as follows: {
local($^W) = 0; # disable warnings...
your code here
} # warnings back on nowperllexwarn documentation for details.-w in production programs. New versions of Perl have created new warnings, which show up as "errors" (broken web pages, strange cron mailings, STDERR sent to users' screens) in programs that worked previously. Tracking these down can be a nontrivial task. I like to keep my code -w clean for all versions, because it makes future changes easier to test with -w. Your mileage may vary.use strict. This is a shorthand for use strict 'refs', 'subs', 'vars', which catches the following things:use strict 'refs' sub setref {
my $string_ref = shift;
my $string = shift;
$$string_ref = $string;
}
setref("Googol", $plexref); # wrong argument ordersetref subroutine is passed "Googol" where it expects a reference to a string. Without use strict 'refs', Perl assumes you meant $Googol. This is called a soft, or symbolic, reference. When you use that pragma, however, Perl whines and dies. Because soft references are almost never needed, use strict "refs" catches a lot of errors that would otherwise silently cause bizarre behavior.use strict 'vars'$Package::Var) or to declare them with my. In almost every case, you really want to use my to scope your variable so that code outside the file or block can't perturb its value. Using my to predeclare all variables (or using cumbersome fully-qualified variable names) will predispose you to document your variables for the hapless fool who must modify your program in a year's time. Don't laugh. It might be you. if ($core->active) {
my $rems; # active radiation in rems
my $rod_volume; # volume of carbon rod remaining
your code here
}use strict 'subs'$result = mysub;) unless the subroutine was declared before its use, either with a prototype or with the subroutine definition itself. If you don't want to predeclare, you must preface the subroutine call with & or append () so that it looks like a subroutine call. This doesn't affect the use of barewords in hashes in curly braces (e.g., -T flag, or if the program's permissions are setuid (meaning that it assumes the identity of its owner rather than whoever is running the program). Use of a tainted value in exec or similar calls, or opening a filename for writing, causes a fatal error. To untaint data, you should extract the safe portion (for a filename, that might be /^([\w.\@-]+)$/) with a regular expression and use $1, $2, and similar variables to access the part of the tainted variable guaranteed to be safe. Full details can be found in the perlsec manual page.-T is almost always a good idea when you're programming defensively. It forces you to validate every piece of user-supplied data with regular expressions before you use them. Not only does this guard against potentially security-compromising errors, it also lets you catch situations where the user gives the wrong type of data (a string instead of a number, for instance).unlink files, fork processes, or do other nasty things.fork will succeed, not every file can be opened, not every child process terminates without error. The return values from system calls contain valuable information on the success or failure of those calls鈥攃heck them!open, fork, exec, and the contents of $? (or $CHILD_ERROR if you use English).0 or undef if something went wrong.print statements can do. When you've located the problem, it can still be difficult to infer the cause. The next step is to write a small program that exhibits the bug and then steps through it with Perl's symbolic debugger (perl -d mysmallprogram). Of course, you can always invoke the debugger directly with perl -de 0 to initiate an interactive session.perldebug documentation for more information.perl -d:DProf myprogram runs your program and creates a file called tmon.out in your current working directory. You then run the dprofpp program to analyze that file and display the fifteen subroutines occupying the most time.dprofpp documentation for more information) but this list of the most time-consuming subroutines is probably the most important. It pinpoints the parts of your program that use the most time, and hence are most suited for optimizing, rewriting, inlining, or avoiding.die messages that you're provided are often not sufficient when it comes to working out where things went wrong. For that you need the awesome power of Jack Shirazi's Devel::DumpStack. When I'm debugging a CGI script that refuses to play ball, I'll use this code, which traps warnings and fatal errors, displaying them in an HTML document instead of burying them in a web server error log: #!/usr/bin/perl -w
use Devel::DumpStack qw(stack_as_string);
use HTML::Entities;
sub my_die {
select(STDOUT); $|=1;
printf(<<"EOF", $?, $!, stack_as_string( ));
Content-Type: text/html
<HTML><HEAD><TITLE>System Error</TITLE></HEAD>
<BODY>
<H1>System Error</H1>
A seriously bad system error happened:<P>
<B>Exit Status</B>: %d<BR>
<B>Error String</B>: %s<P>
<B>Stack Dump</B>:
<PRE>
%s
</PRE>
</BODY></HTML>
EOF
exit;
}
BEGIN {
$SIG{__WARN__} = $SIG{__DIE_ _} = \&my_die;
}
$a = undef + 4;
exit;x**y instead of xy. In this case x**y**z means x**(y**z), not (x**y)**z, so ** is resolved right-to-left.
, dispenses with precedence entirely and resolves them all from right-to-left. The advantage of this is that we don't have to remember any rules, and the disadvantage is that many expressions are confusing: if we write 2 x 3 + 4, you get 14, not 10. In LISP the issue never comes up, because in LISP the parentheses are required, and so there are no ambiguous expressions. (Now you know why LISP looks the way it does.)$v = $x + 3;
($v = $x) + 3;
$v = ($x + 3);
$x into $v, and then computes the value of $x + 3 and throws the result of the addition away. In this case, the addition was useless. The second one, however, makes sense, because it does the addition first and stores the result into $v. Since people write things like:$v = $x + 3;
$result = $x =~ /foo/;
$result = ($x =~ /foo/);
$x contains the string foo, and stores a true or false result into $result. It doesn't mean this:($result = $x) =~ /foo/;
$x into $result and then looks to see if $result contains foo. In this case it's likely that the programmer wanted the first meaning, not the second. But sometimes we do want it to go the other way. Consider this expression:$p = $q =~ s/w//g;
$p = ($q =~ s/w//g);
w's are removed from $q, and the number of successful substitutions is stored into $p. However, sometimes we really do want the other meaning:($p = $q) =~ s/w//g;
$q into $p, and then removes all the w's from $p, leaving $q alone. If we want this, we have to include the parentheses explicitly, because = has lower precedence than =~.$worked = 1 + $s =~ /pattern/;
($worked = 1) + ($s =~ /pattern/);(($worked = 1) + $s) =~ /pattern/;($worked = (1 + $s)) =~ /pattern/;$worked = ((1 + $s) =~ /pattern/);$worked = (1 + ($s =~ /pattern/));open(F, "< $file") || die "Couldn't open $file: $!";
open call:open F, "< $file" || die "Couldn't open $file: $!";
open F, ("< $file" || die "Couldn't open $file: $!");die will only be executed when the string "<$file" is false, which never happens. Since the die is controlled by the string and not by the open call, the program will not abort on errors the way we wanted. Here we wish that || had lower precedence, so that we could write: try to perform big long hairy complicated action || die ;