By Tom Christiansen, Nathan Torkington
Cover | Table of Contents | Colophon
substr for that. Like all data types in Perl,
strings grow and shrink on demand. They get reclaimed by Perl's
garbage collection system when they're no longer used,
typically when the variables holding them go out of scope or when the
expression they were used in has been evaluated. In other words,
memory management is already taken care of for you, so you
don't have to worry about it.substr for that. Like all data types in Perl,
strings grow and shrink on demand. They get reclaimed by Perl's
garbage collection system when they're no longer used,
typically when the variables holding them go out of scope or when the
expression they were used in has been evaluated. In other words,
memory management is already taken care of for you, so you
don't have to worry about it.substr
function lets you read from and write to bits of the string.$value = substr($string, $offset, $count);
$value = substr($string, $offset);
substr($string, $offset, $count) = $newstring;
substr($string, $offset) = $newtail;
unpack
function gives only read access, but is faster when you have many
substrings to extract.# get a 5-byte string, skip 3, then grab 2 8-byte strings, then the rest
($leading, $s1, $s2, $trailing) =
unpack("A5 x3 A8 A8 A*", $data);
# split at five byte boundaries
@fivers = unpack("A5" x (length($string)/5), $string);
# chop string into individual characters
@chars = unpack("A1" x length($string), $string);
unpack or
substr to access individual characters or a
portion of the string.substr indicates the start
of the substring you're interested in, counting from the front
if positive and from the end if negative. If offset is 0, the
substring starts at the beginning. The count argument is the length
of the substring.$string = "This is what you have"; # +012345678901234567890 Indexing forwards (left to right) # 109876543210987654321- Indexing backwards (right to left) # note that 0 means 10 or 20, etc. above $first = substr($string, 0, 1); # "T" $start = substr($string, 5, 2); # "is" $rest = substr($string, 13); # "you have" $last = substr($string, -1); # "e" $end = substr($string, -4); # "have" $piece = substr($string, -8, 3); # "you"
||
or ||= operator,
which work on both strings and numbers:# use $b if $b is true, else $c $a = $b || $c; # set $x to $y unless $x is already true $x ||= $y
0 or "0" are valid values
for your variables, use
defined
instead:# use $b if $b is defined, else $c $a = defined($b) ? $b : $c;
defined and ||) is what they
test:
definedness versus truth. Three defined values are still false in the
world of Perl: 0, "0", and "".
If your variable already held one of those, and you wanted to keep
that value, a || wouldn't work. You'd
have to use the clumsier tests with defined
instead. It's often convenient to arrange for your program to
care only about true or false values, not defined or undefined ones.||
operator has a much more interesting property: It returns its first
operand (the left-hand side) if that operand is true; otherwise it
returns its second operand. The
&&
operator also returns the last
evaluated expression, but is less often used for this property. These
operators don't care whether their operands are strings,
numbers, or references—any scalar will do. They just return the
first one that makes the whole expression true or false. This
doesn't affect the Boolean sense of the return value, but it
does make the operators more convenient to use.($VAR1, $VAR2) = ($VAR2, $VAR1);
$temp = $a; $a = $b; $b = $temp;
$a = "alpha"; $b = "omega"; ($a, $b) = ($b, $a); # the first shall be last -- and versa vice
($alpha, $beta, $production) = qw(January March August); # move beta to alpha, # move production to beta, # move alpha to production ($alpha, $beta, $production) = ($beta, $production, $alpha);
$alpha,
$beta, and $production have the
values "March", "August", and
"January".ord to convert a character to a number, or use
chr to convert a number to a character:$num = ord($char); $char = chr($num);
%c format used in printf
and sprintf also converts a number to a character:$char = sprintf("%c", $num); # slower than chr($num)
printf("Number %d is character %c\n", $num, $num);
Number 101 is character e
C* template used with pack
and unpack can quickly convert many characters.@ASCII = unpack("C*", $string);
$STRING = pack("C*", @ascii);
chr
and
ord to convert between a character and its
corresponding ordinal value:$ascii_value = ord("e"); # now 101
$character = chr(101); # now "e"
print or the %s format in
printf and sprintf. The
%c
format forces
printf or sprintf to convert a
number into a character; it's not used for printing a character
that's already in character format (that is, a string).printf("Number %d is character %c\n", 101, 101);
pack
,
unpack, chr, and
ord functions are all faster than
sprintf. Here are pack and
unpack in action:@ascii_character_numbers = unpack("C*", "sample");
print "@ascii_character_numbers\n";
@array = split(//, $string);
@array = unpack("C*", $string);
while (/(.)/g) { # . is never a newline here
# do something with $1
}
/X*/ matches the
empty string. Odds are you will find others when you don't mean
to.an
apple
a
day", sorted in ascending ASCII order:%seen = ();
$string = "an apple a day";
foreach $byte (split //, $string) {
$seen{$byte}++;
}
print "unique chars are: ", sort(keys %seen), "\n";
unique chars are: adelnpy
split and unpack
solutions give you an array of characters to work with. If you
don't want an array, you can use a pattern match with the
/g flag in a while loop,
extracting one character at a time:%seen = ();
$string = "an apple a day";
while ($string =~ /(.)/g) {
$seen{$1}++;
}
print "unique chars are: ", sort(keys %seen), "\n";
unique chars are: adelnpy
reverse
function in scalar context for flipping bytes.$revbytes = reverse($string);
reverse in list context with
split and join:$revwords = join(" ", reverse split(" ", $string));
reverse function is two different functions in
one. When called in scalar context, it joins together its arguments
and returns that string in reverse order. When called in list
context, it returns its arguments in the opposite order. When using
reverse for its byte-flipping behavior, use
scalar to force scalar context unless it's
entirely obvious.$gnirts = reverse($string); # reverse letters in $string
@sdrow = reverse(@words); # reverse elements in @words
$confused = reverse(@words); # reverse letters in join("", @words)
split is a
special case. It causes split to use contiguous
whitespace as the separator and also discard any leading null fields,
just like awk. Normally,
split discards only trailing null fields.# reverse word order
$string = 'Yoda said, "can you see this?"';
@allwords = split(" ", $string);
$revwords = join(" ", reverse @allwords);
print $revwords, "\n";
this?" see you "can said, Yoda
@allwords and
do it on one line:$revwords = join(" ", reverse split(" ", $string));
$string becomes a single
space in $revwords. If you want to preserve
whitespace, use this:$revwords = join("", reverse split(/(\s+)/, $string));
reverse is to test whether a word is a
palindrome (a word that reads the same backward or forward):while ($string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e) {
# spin in empty loop until substitution finally fails
}
use Text::Tabs; @expanded_lines = expand(@lines_with_tabs); @tabulated_lines = unexpand(@lines_without_tabs);
$` variable, whose very mention currently
slows down every pattern match in the program. The reason for this is
given in Section 6.0.3 in Chapter 6.while (<>) {
1 while s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
print;
}
while loop
and wondering why it couldn't have been written as part of a
simple s///g instead, it's because you need
to recalculate the length from the start of the line again each time
(stored in $`) rather than merely from where
the last match occurred.1
while
CONDITION is the same as while
(CONDITION)
{
}, but shorter. Its origins date to when Perl ran
the first incredibly faster than the second. While the second is now
almost as fast, it remains convenient, and the habit has stuck.You owe $debt to me.
$debt in the string with
its value.$text =~ s/\$(\w+)/${$1}/g;
/ee
if they might be lexical
(my) variables:$text =~ s/(\$\w+)/$1/gee;
$1 contains the string
somevar, then ${$1} will be
whatever $somevar contains. This won't work
if the use
strict
'refs' pragma is in effect because
that bans symbolic dereferencing.use vars qw($rows $cols);
no strict 'refs'; # for ${$1}/g below
my $text;
($rows, $cols) = (24, 80);
$text = q(I am $rows high and $cols long); # like single quotes!
$text =~ s/\$(\w+)/${$1}/g;
print $text;
I am 24 high and 80 long
/e
substitution modifier used to evaluate
the replacement as code rather than as a string. It's designed
for situations such as doubling every whole number in a string:$text = "I am 17 years old"; $text =~ s/(\d+)/2 * $1/eg;
/e
on a substitute, it compiles the code in the replacement block along
with the rest of your program, long before the substitution actually
happens. When a substitution is made, $1 is
replaced with the string that matched. The code to evaluate would
then be something like:2 * 17
$text = 'I am $AGE years old'; # note single quotes $text =~ s/(\$\w+)/$1/eg; # WRONG
lc
and
uc functions or the \L and
\U
string
escapes.use locale; # needed in 5.004 or above $big = uc($little); # "bo peep" -> "BO PEEP" $little = lc($big); # "JOHN" -> "john" $big = "\U$little"; # "bo peep" -> "BO PEEP" $little = "\L$big"; # "JOHN" -> "john"
lcfirst
and ucfirst functions or the \l
and \u string escapes.$big = "\u$little"; # "bo" -> "Bo" $little = "\l$big"; # "BoPeep" -> "boPeep"
use
locale directive tells Perl's
case-conversion functions and pattern matching engine to respect your
language environment, allowing for characters with diacritical marks,
and so on. A common mistake is to use
tr///
to convert case. (We're aware that
the old Camel book recommended tr/A-Z/a-z/. In our
defense, that was the only way to do it back then.) This won't
work in all situations because when you say
tr/A-Z/a-z/ you have omitted all characters with
umlauts, accent marks, cedillas, and other diacritics used in dozens
of languages, including English. The uc and
\U case-changing commands understand these
characters and convert them properly, at least when you've said
use
locale. (An exception is
that in German, the uppercase form of ß is
SS, but it's not in Perl.)$answer = $var1 . func() . $var2; # scalar only
@{[
LIST
EXPR
]}
or ${
\(SCALAR
EXPR
)
}
expansions:$answer = "STRING @{[ LIST EXPR ]} MORE STRING";
$answer = "STRING ${\( SCALAR EXPR )} MORE STRING";
$phrase = "I have " . ($n + 1) . " guanacos.";
$phrase = "I have ${\($n + 1)} guanacos.";
print effectively concatenates its entire argument
list, if we were going to print
$phrase, we could have just said:print "I have ", $n + 1, " guanacos.\n";
@, $, and \
are special within double quotes and most backquotes. (As with
m// and s///, the
qx() synonym is not subject to double-quote
expansion if its delimiter is single quotes! $home
=
qx'echo
home
is
$HOME'; would get the shell
$HOME variable, not one in Perl.) So, the only way
to force arbitrary expressions to expand is by expanding a
${} or @{} whose block contains
a reference.s///
operator to strip out leading
whitespace.# all in one
($var = <<HERE_TARGET) =~ s/^\s+//gm;
your text
goes here
HERE_TARGET
# or with two steps
$var = <<HERE_TARGET;
your text
goes here
HERE_TARGET
$var =~ s/^\s+//gm;
/m
modifier lets the ^ character match at the start
of each line in the string, and the /g modifier
makes the pattern matching engine repeat the substitution as often as
it can (i.e., for every line in the here document).($definition = <<'FINIS') =~ s/^\s+//gm;
The five varieties of camelids
are the familiar camel, his friends
the llama and the alpaca, and the
rather less well-known guanaco
and vicuña.
FINIS
\s
, which
will also match newlines. This means they will remove any blank lines
in your here document. If you don't want this, replace
\s with [^\S\n] in the
patterns.=~. This lets us do it all in one line, but it
only works when you're assigning to a variable. When
you're using the here document directly, it would be considered
a constant value and you wouldn't be able to modify it. In
fact, you can't change a here document's value
unless you first put it into a variable.use Text::Wrap; @OUTPUT = wrap($LEADTAB, $NEXTTAB, @PARA);
wrap function,
shown in Example 1.3, which takes a list of lines
and reformats them into a paragraph having no line more
than
$Text::Wrap::columns characters long. We set
$columns to 20, ensuring that no line will be
longer than 20 characters. We pass wrap two
arguments before the list of lines: the first is the indent for the
first line of output, the second the indent for every subsequent
line.#!/usr/bin/perl -w
# wrapdemo - show how Text::Wrap works
@input = ("Folding and splicing is the work of an editor,",
"not a mere collection of silicon",
"and",
"mobile electrons!");
use Text::Wrap qw($columns &wrap);
$columns = 20;
print "0123456789" x 2, "\n";
print wrap(" ", " ", @input), "\n";
01234567890123456789
Folding and
splicing is the
work of an
editor, not a
mere collection
sprintf and want to convert literal
% signs into %%.# backslash $var =~ s/([CHARLIST])/\\$1/g; # double $var =~ s/([CHARLIST])/$1$1/g;
$var is the variable to be altered. The
CHARLIST is a list of characters to escape and can
contain backslash escapes like \t and
\n. If you just have one character to escape, omit
the brackets:$string =~ s/%/%%/g;
' and " to make any arbitrary string safe
for the shell. Getting the list of characters right is so hard, and
the risks if you get it wrong are so great, that you're better
off using the list form of system and
exec to run programs, shown in Section 16.2. They avoid the shell altogether.)$string = q(Mom said, "Don't do that."); $string =~ s/(['"])/\\$1/g;
$string = q(Mom said, "Don't do that."); $string =~ s/(['"])/$1$1/g;
$string =~ s/^\s+//; $string =~ s/\s+$//;
$string = trim($string);
@many = trim(@many);
sub trim {
my @out = @_;
for (@out) {
s/^\s+//;
s/\s+$//;
}
return wantarray ? @out : $out[0];
}
chop function. Version 5 added
chomp, which removes the last character if and
only if it is contained in the $/ variable,
"\n" by default. These are often used to remove
the trailing newline from input:# print what's typed, but surrounded by >< symbols
while(<STDIN>) {
chomp;
print ">$_<\n";
}
s/// operator in perlre(1) and perlop(1) and the
"Pattern Matching" section of Chapter 2 of
Programming Perl; the
chomp and chop functions in
perlfunc(1) and Chapter 3 of
Programming Perl; we trim leading and
trailing whitespace in the getnum function in
Section 2.1.sub parse_csv {
my $text = shift; # record containing comma-separated values
my @new = ();
push(@new, $+) while $text =~ m{
# the first part groups the phrase inside the quotes.
# see explanation of this pattern in MRE
"([^\"\\]*(?:\\.[^\"\\]*)*)",?
| ([^,]+),?
| ,
}gx;
push(@new, undef) if substr($text, -1,1) eq ',';
return @new; # list of values that were comma-separated
}
use Text::ParseWords;
sub parse_csv {
return quotewords(",",0, $_[0]);
}
split
/,/.quotewords
function two arguments and the CSV string. The first argument is the
separator (a comma, in this case) and the second is a true or false
value controlling whether the strings are returned with quotes around
them."like\"this\"". Quotation
marks and backslashes are the only characters that have meaning
backslashed. Any other use of a backslash will be left in the output
string.parse_csv
subroutines. The
use Text::Soundex; $CODE = soundex($STRING); @CODES = soundex(@LIST);
use Text::Soundex;
use User::pwent;
print "Lookup user: ";
chomp($user = <STDIN>);
exit unless defined $user;
$name_code = soundex($user);
while ($uent = getpwent()) {
($firstname, $lastname) = $uent->gecos =~ /(\w+)[^,]*\b(\w+)/;
if ($name_code eq soundex($uent->name) ||
$name_code eq soundex($lastname) ||
$name_code eq soundex($firstname) )
{
printf "%s: %s %s\n", $uent->name, $firstname, $lastname;
}
}
|
Old Words
|
New Words
|
|---|---|
|
bonnet
|
hood
|
|
rubber
|
eraser
|
|
lorry
|
truck
|
|
trousers
|
pants
|
".orig" extension.
See Section 7.9 for a description. A -v command-line option writes notification of
each change to standard error.__END__ in the main program as described in
Section 7.6. Each pair of strings is converted into
carefully escaped substitutions and accumulated into the
$code variable like the
popgrep2 program in Section 6.10.-t check to test for an interactive run check
tells whether we're expecting to read from the keyboard if no
arguments are supplied. That way if the user forgets to give an
argument, they aren't wondering why the program appears to be
hung.#!/usr/bin/perl -w
# fixstyle - switch first set of <DATA> strings to second set
# usage: $0 [-v] [files ...]
use strict;
my $verbose = (@ARGV && $ARGV[0] eq '-v' && shift);
if (@ARGV) {
$^I = ".orig"; # preserve old files
} else {
warn "$0: Reading from stdin\n" if -t STDIN;
}
my $code = "while (<>) {\n";
# read in config, build up code to eval
while (<DATA>) {
chomp;
my ($in, $out) = split /\s*=>\s*/;
next unless $in && $out;
$code .= "s{\\Q$in\\E}{$out}g";
$code .= "&& printf STDERR qq($in => $out at \$ARGV line \$.\\n)"
if $verbose;
$code .= ";\n";
}
$code .= "print;\n}\n";
eval "{ $code } 1" || die;
__END__
analysed => analyzed
built-in => builtin
chastized => chastised
commandline => command-line
de-allocate => deallocate
dropin => drop-in
hardcode => hard-code
meta-data => metadata
multicharacter => multi-character
multiway => multi-way
non-empty => nonempty
non-profit => nonprofit
non-trappable => nontrappable
pre-define => predefine
preextend => pre-extend
re-compiling => recompiling
reenter => re-enter
turnkey => turn-key% psgrep '/sh\b/'
% psgrep 'command =~ /sh$/'
% psgrep 'uid < 10'
% psgrep 'command =~ /^-/' 'tty ne "?"'
% psgrep 'tty =~ /^[p-t]/'
% psgrep 'uid && tty eq "?"'
% psgrep 'size > 10 * 2**10' 'uid != 0'
FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND
0 101 9751 1 0 0 14932 9652 do_select S p1 0:25 netscape
100000 101 9752 9751 0 0 10636 812 do_select S p1 0:00 (dns helper)A7" is still
0, and "7A" is just
7. (Note, however, that the -w flag will warn of such improper
conversions.) Sometimes (such as when validating input) you need to
know if a string represents a valid number. We show you how in Section 2.1.0xff". Perl automatically converts literals in
your program code (so $a
=
3
+
0xff
will set $a to 258) but not data read by that
program (you can't read "0xff" into
$b and then say $a
=
3
+
$b to make $a become 258).A7" is still
0, and "7A" is just
7. (Note, however, that the -w flag will warn of such improper
conversions.) Sometimes (such as when validating input) you need to
know if a string represents a valid number. We show you how in Section 2.1.0xff". Perl automatically converts literals in
your program code (so $a
=
3
+
0xff
will set $a to 258) but not data read by that
program (you can't read "0xff" into
$b and then say $a
=
3
+
$b to make $a become 258).if ($string =~ /PATTERN/) {
# is a number
} else {
# is not
}
warn "has nondigits" if /\D/;
warn "not a natural number" unless /^\d+$/; # rejects -3
warn "not an integer" unless /^-?\d+$/; # rejects +3
warn "not an integer" unless /^[+-]?\d+$/;
warn "not a decimal number" unless /^-?\d+\.?\d*$/; # rejects .2
warn "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/;
warn "not a C float"
unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;
trim function from Section 1.14.POSIX::strtod function. Its semantics are
cumbersome, so here's a getnumsprintf to format the numbers to a certain
number of decimal places, then compare the resulting strings:# equal(NUM1, NUM2, ACCURACY) : returns true if NUM1 and NUM2 are
# equal to ACCURACY number of decimal places
sub equal {
my ($A, $B, $dp) = @_;
return sprintf("%.${dp}g", $A) eq sprintf("%.${dp}g", $B);
}
equal routine because most
computers' floating-point representations aren't
accurate. See the Introduction for a discussion of this issue.$3.50 as 350 instead of
3.5 removes the need for floating-point values.
Reintroduce the decimal point on output:$wage = 536; # $5.36/hour
$week = 40 * $wage; # $214.40
printf("One week's wage is: \$%.2f\n", $week/100);
One week's wage is: $214.40
sprintf function in perlfunc
(1) and Chapter 3 of Programming Perl
; the entry on $# in the
perlvar(1) manpage and Chapter 2 of
Programming Perl; the documentation for the
standard Math::BigFloat module (also in Chapter 7 of
Programming Perl); we use
sprintf in Section 2.3; Volume 2,
Section 4.2.2 of The Art of Computer
Programming
sprintf, or
printf if you're just trying to produce
output:$rounded = sprintf("%FORMATf", $unrounded);
sprintf. The
f format lets you specify a particular number of
decimal places to round its argument to. Perl looks at the following
digit, rounds up if it is 5 or greater, and rounds down otherwise.$a = 0.255;
$b = sprintf("%.2f", $a);
print "Unrounded: $a\nRounded: $b\n";
printf "Unrounded: $a\nRounded: %.2f\n", $a;
Unrounded: 0.255
Rounded: 0.26
Unrounded: 0.255
Rounded: 0.26
int
,
ceil, and floor.
int, built into Perl, returns the integral portion
of the floating-point number passed to it (int
will use $_ if it was called without an argument).
The POSIX module's
floor and ceil functions round
their argument down and up to the next integer, respectively.use POSIX;
print "number\tint\tfloor\tceil\n";
@a = ( 3.3 , 3.5 , 3.7, -3.3 );
foreach (@a) {
printf( "%.1f\t%.1f\t%.1f\t%.1f\n",
$_, int($_), floor($_), ceil($_) );
}
"N" format), then unpack it again
bit by bit (the "B32"