Chapter 4. Arrays
Works of art, in my opinion, are the only objects in the material universe to possess internal order, and that is why, though I don’t believe that only art matters, I do believe in Art for Art’s sake.
Introduction
If you are asked about the contents of your pockets, or the names of the first three Greek letters, or how to get to the highway, you recite a list: you name one thing after another in a particular order. Lists are part of your conception of the world. With Perl’s powerful list- and array-handling primitives, you can translate this world view directly into code.
In this chapter, we’ll use the terms list
and array as the Perl language thinks of them.
Take ("alpha"
, "beta"
, "gamma")
; that’s a list of
the names of the first three Greek letters, in order. To store that list
into a variable, use an array, as in @greeks
=
("alpha"
, "beta"
, "gamma")
. Both are ordered groups of scalar
values; the difference is that an array is a named variable, one whose
array length can be directly changed, whereas a list is a more ephemeral
notion. You might think of an array as a variable and a list as the
values it contains.
This distinction may seem arbitrary, but operations that modify
the length of these groupings (like push
and pop
) require a proper array and not merely a
list. Think of the difference between $a
and 4
.
You can say $a++
but not 4++
. Likewise, you can say pop(@a)
but not pop
(1,2,3)
.
The most important thing to glean from this is that Perl’s lists and arrays are both ordered groupings of scalars. Operators and functions that work on lists or arrays are designed to provide faster or more convenient access to the elements than manual access would provide. Since few actually deal with modifying the array’s length, you can usually use arrays and lists interchangeably.
You can’t use nested parentheses to create a list of lists. If you try that in Perl, your lists get flattened, meaning that both these lines are equivalent:
@nested = ("this", "that", "the", "other"); @nested = ("this", "that", ("the", "other"));
Why doesn’t Perl (usefully) just support nested lists directly?
Although partially for historical reasons, this easily allows for
operations (like print
or sort
) that work on arbitrarily long lists of
arbitrary contents.
What happens if you want a more complex data structure, such as an array of arrays or an array of hashes? Remember that scalars aren’t restricted to containing just numbers or strings; they can also hold references. Complex (multilevel) data structures in Perl are always put together using references. Therefore, what appear to be “two-dimensional arrays” or “arrays of arrays” are always implemented as arrays of array references, in the same way that two-dimensional arrays in C can be arrays of pointers to arrays.
Most recipes in this chapter don’t care what you keep in your arrays; for example, the problem of merging two arrays is the same whether the arrays contains strings, numbers, or references. Some problems are intrinsically tied to the contents of your arrays; recipes for those are in Chapter 11. This chapter’s recipes deal with generic arrays.
Let’s have some more terminology. The scalar items in an array or list are called elements, which you access by specifying their position, or index. Indices in Perl start at 0. So, given this list:
@greeks = ( "alpha", "beta", "gamma" );
"alpha
" is in the first
position, but you’d access it as $greeks[0]
. "beta
" is in the second position, but you’d
access it as $greeks[1]
. This
structure is doubly justified: the contrariness of computers, whose
first representable number is 0, and the contrariness of language
designers, who chose 0 because it is an offset into
the array, not the ordinal number of the
element.
4.1. Specifying a List in Your Program
Solution
You can write out a comma-separated list of elements:
@a = ("quick", "brown", "fox");
If you have a lot of single-word elements, use the
qw( )
operator:
@a = qw(Meddle not in the affairs of wizards.);
If you have a lot of multiword elements, use a here document and extract lines:
@lines = (<< "END_OF_HERE_DOC" =~ /^\s*(.+)/gm); I sit beside the fire and think of all that I have seen, of meadow-flowers and butterflies and summers that have been; END_OF_HERE_DOC
Discussion
The first technique is the one most commonly used, often because only small arrays are normally initialized as program literals. Initializing a large array would fill your program with values and make it hard to read, so such arrays either tend to be initialized in a separate library file (see Chapter 12), or else have their values read in from a file:
@bigarray = ( ); open(FH, "<", "myinfo") or die "Couldn't open myinfo: $!"; while (<FH>) { chomp; push(@bigarray, $_); } close(FH);
The second technique uses qw(
)
, one of several pseudo-functions in Perl used for quoting
without having to resort to actual quotation marks. This one splits
its string argument on whitespace to produce a list of words, where
“words” in this instance means strings that don’t contain any
whitespace. The initial argument is not subject to interpolation of
variables or (most) backslash escape sequences.
@banner = ('Costs', 'only', '$4.95'); @banner = qw(Costs only $4.95); @banner = split(' ', 'Costs only $4.95');
You can use qw( )
only when
each whitespace-separated argument is to be a distinct element in the
return list. Be careful not to give Columbus four ships instead of
three:
@ships = qw(Niña Pinta Santa María); # WRONG @ships = ('Niña', 'Pinta', 'Santa María'); # right
The third solution takes a here document, which is a single,
multiline string, and applies a global pattern match to that string.
The pattern /^\s*(.+)/
says to skip
any whitespace at the start of the line, then capture everything
through the end of each line. The /g
modifier means to apply that match
globally, and the /m
modifier says
to permit ^
to match not just at
the beginning of the string, but also immediately after a newline,
which, in a multiline string, is just what you need. Applying that
technique to the ships example yields:
@ships = ( << "END_OF_FLOTILLA" =~ /^\s*(.+)/gm); Niña Pinta Santa María END_OF_FLOTILLA
See Also
The “List Value Constructors” section of perldata(1); the “List Values and Arrays”
section of Chapter 2 of Programming Perl; the
“Quote and Quote-Like Operators” section of perlop(1); the s///
operator in perlop(1) and Chapter 5 of
Programming Perl
4.2. Printing a List with Commas
Problem
You’d like to print out a list containing an unknown number of elements, placing an “and” before the last element and commas between each element if there are more than two.
Solution
Use this function, which returns the formatted string:
sub commify_series { (@_ = = 0) ? '' : (@_ = = 1) ? $_[0] : (@_ = = 2) ? join(" and ", @_) : join(", ", @_[0 .. ($#_-1)], "and $_[-1]"); }
Discussion
It often looks odd to print out arrays:
@array = ("red", "yellow", "green"); print "I have ", @array, " marbles.\n"; print "I have @array marbles.\n"; I have redyellowgreen marbles I have red yellow green marbles
What you really want it to say is, "I
have
red
, yellow
, and
green
marbles
“. The function given in the
solution generates strings in that format. The word "and
" is placed between the last two list
elements. If there are more than two elements in the list, a comma is
placed between every element.
Example 4-1 gives a complete demonstration of the function, with one addition: if any element in the list already contains a comma, a semicolon is used for the separator character instead.
#!/usr/bin/perl -w # commify_series - show proper comma insertion in list output # @lists is an array of (references to anonymous) arrays @lists = ( [ 'just one thing' ], [ qw(Mutt Jeff) ], [ qw(Peter Paul Mary) ], [ 'To our parents', 'Mother Theresa', 'God' ], [ 'pastrami', 'ham and cheese', 'peanut butter and jelly', 'tuna' ], [ 'recycle tired, old phrases', 'ponder big, happy thoughts' ], [ 'recycle tired, old phrases', 'ponder big, happy thoughts', 'sleep and dream peacefully' ], ); foreach $aref (@lists) { print "The list is: " . commify_series(@$aref) . ".\n"; } # demo for single list @list = qw(one two three); print "The last list is: " . commify_series(@list) . ".\n"; sub commify_series { my $sepchar = grep(/,/ => @_) ? ";" : ","; (@_ = = 0) ? '' : (@_ = = 1) ? $_[0] : (@_ = = 2) ? join(" and ", @_) : join("$sepchar ", @_[0 .. ($#_-1)], "and $_[-1]"); }
Here’s the output from the program:
The list is: just one thing The list is: Mutt and Jeff The list is: Peter, Paul, and Mary The list is: To our parents, Mother Theresa, and God The list is: pastrami, ham and cheese, peanut butter and jelly, and tuna The list is: recycle tired, old phrases and ponder big, happy thoughts The list is: recycle tired, old phrases; ponder big, happy thoughts; and sleep and dream peacefully The last list is: one, two, and three
As you see, we don’t follow the ill-advised practice of omitting the final comma from a series under any circumstances. To do so introduces unfortunate ambiguities and unjustifiable exceptions. The examples shown would have claimed that we were the offspring of Mother Teresa and God, and would have had us eating sandwiches made of jelly and tuna fish mixed together atop the peanut butter.
See Also
Fowler’s Modern English Usage; we explain
the nested list syntax in Recipe 11.1; the grep
function in perlfunc(1) and Chapter 29 of
Programming Perl; the conditional operator
("?:")
is discussed in perlop(1) and in the “Conditional Operator”
section of Chapter 3 of Programming Perl
4.3. Changing Array Size
Problem
You want to enlarge or truncate an array. For example,
you might truncate an array of employees that’s already sorted by
salary to list the five highest-paid employees. Or, if you know how
big your array will get and that it will grow piecemeal, it’s more
efficient to grab memory for it in one step by enlarging just once
than to keep push
ing values onto
the end.
Solution
# grow or shrink @ARRAY $#ARRAY = $NEW_LAST_ELEMENT_INDEX_NUMBER;
Assigning to an element past the end automatically extends the array:
$ARRAY[$NEW_LAST_ELEMENT_INDEX_NUMBER] = $VALUE;
Discussion
$#ARRAY
is the
number of the last valid index in @ARRAY
. If we assign it a number smaller
than its current value, we truncate the array. Truncated elements are
lost forever. If we assign $#ARRAY
a number larger than its current value, the array grows. New elements
have the undefined value.
$#ARRAY
is not @ARRAY
, though. Although $#ARRAY
is the last valid index in the
array, @ARRAY
(in scalar context,
as when treated as a number) is the number of
elements. $#ARRAY
is one less than
@ARRAY
because array indices start
at 0.
Here’s some code that uses both. We have to say scalar
@array
in the print
because Perl gives list context to
(most) functions’ arguments, but we want @array
in scalar context.
sub what_about_that_array { print "The array now has ", scalar(@people), " elements.\n"; print "The index of the last element is $#people.\n"; print "Element #3 is `$people[3]'.\n"; } @people = qw(Crosby Stills Nash Young); what_about_that_array( );
prints:
The array now has 4 elements The index of the last element is 3 Element #3 is `Young'
whereas:
$#people--; what_about_that_array( );
prints:
The array now has 3 elements The index of the last element is 2 Element #3 is `'
Element #3 disappeared when we shortened the array. If we’d
turned on warnings (either with the -w command-line option to Perl or with use warnings
inside the program), Perl would
also have warned “use of uninitialized value” because $people[3]
is undefined.
$#people = 10_000; what_about_that_array( );
prints:
The array now has 10001 elements The index of the last element is 10000 Element #3 is `'
The "Young
" element is now
gone forever. Instead of assigning to $#people
, we could have said:
$people[10_000] = undef;
although this isn’t exactly the same. If you have a three-element array, as in:
@colors = qw(red blue green);
and you undef
its last
element:
undef $colors[2]; # green is gone
you still have a three-element array; its last element is just
undefined. If you pop
the array,
either via the function or manually by changing $#colors
:
$last_color = $colors[ $#colors-- ];
then the array grows shorter by one element.
Perl arrays are not sparse. In other words, if you have a
10,000th element, you must have the 9,999
other elements, too. They may be undefined, but they still take up
memory. For this reason, $array[time(
)]
, or any other construct that uses a very large integer as
an array index, is a really bad idea. Use a hash instead.
See Also
The discussion of the $#ARRAY
notation in perldata(1), also explained in the
“List Values and Arrays” section of Chapter 2 of
Programming Perl
4.4. Implementing a Sparse Array
Problem
An array with large, unoccupied expanses between occupied elements wastes memory. How do you reduce that overhead?
Solution
Use a hash instead of an array.
Discussion
If you assign to the millionth element of an array, Perl
allocates a million and one slots to store scalars. Only the last
element contains interesting data, leaving earlier ones each set to
undef
at a cost of four (or more)
bytes per unoccupied slot.
In recent versions of Perl, if you grow an array by assigning
either past the end or directly to $#ARRAY
, you can distinguish these implicit
undef
s from those that would result
from assigning undef
there by using
exists
instead of defined
, just as you would with a
hash.
$#foo = 5; @bar = ( (undef) x 5 ) ; printf "foo element 3 is%s defined\n", defined $foo[3] ? "" : "n't"; printf "foo element 3 does%s exist\n", exists $foo[3] ? "" : "n't"; printf "bar element 3 is%s defined\n", defined $bar[3] ? "" : "n't"; printf "bar element 3 does%s exist\n", exists $bar[3] ? "" : "n't"; foo element 3 isn't defined foo element 3 doesn't exist bar element 3 isn't defined bar element 3 does exist
However, you still waste a lot of space. That’s because Perl’s array implementation reserves a contiguous vector, one for each element up to the highest occupied position.
$real_array[ 1_000_000 ] = 1; # costs 4+ megabytes
A hash works differently: you pay only for what you really use, not for unoccupied positions. Although a hash element costs somewhat more than an array element because you need to store both the value and its key, with sparse arrays, the savings can be astonishing.
$fake_array{ 1_000_000 } = 1; # costs 28 bytes
What’s the trade-off? Because a hash’s keys aren’t ordered, a little more work is needed to sort the numeric keys so you can handle their values in the same order as you would if they were stored as a real array. With an array, you’d just do this to process elements in index order:
foreach $element ( @real_array ) { # do something with $element }
or this to process indices in ascending order:
foreach $idx ( 0 .. $#real_array ) { # do something with $real_array[$idx] }
Using a hash representation, you should instead do either this to process elements in index order:
foreach $element ( @fake_array{ sort {$a <=> $b} keys %fake_array } ) { # do something with $element }
or this to process indices in ascending order:
foreach $idx ( sort {$a <=> $b} keys %fake_array ) { # do something with $fake_array{$idx} }
If you don’t care about handling elements in a particular order, however, you don’t need to go through all that. Just process the values according to their internal order, either like this:
foreach $element ( values %fake_array ) { # do something with $element }
or like this:
# process indices in internal hash order foreach $idx ( keys %fake_array ) { # do something with $fake_array{$idx} }
If you’re determined to use an array, two fairly specialized cases occasionally arise in which you can save substantial amounts of memory by using an alternate storage scheme. Both cases also apply to arrays that are densely populated, not just those that are mostly empty.
The first case shows up when you grow an array by repeatedly appending new elements until its subscripts become large. Because of how Perl reallocates memory for growing arrays, this can use up to four times the memory you really need. If you happen to know how big the array will (or might) eventually become, you can avoid this reallocation overhead either by storing the large subscripts first instead of the small ones:
for ($i = 10_000; $i >= 0; $i--) { $real_array[$i] = 1 }
or by presizing the array by assigning to the special $#ARRAY
notation:
$#real_array = 10_000;
The second special case comes up when each array element holds nothing but a single one-bit value—essentially either a true or a false. For example, suppose you are keeping track of numbered USENET news articles, and you only need to know whether a given article number has been read. For situations like this, use a bit vector instead of a real array:
my $have_read = ''; for ($i = 10_000; $i >= 0; $i--) { vec($have_read, $i, 1) = 1 }
Then you can check to see whether a given article has been read this way:
if (vec($have_read, $artno, 1)) { .... }
4.5. Iterating Over an Array
Problem
You want to repeat a procedure for every element in a list.
Often you use an array to collect information you’re interested in; for instance, login names of users who have exceeded their disk quota. When you finish collecting the information, you want to process it by doing something with every element in the array. In the disk quota example, you might send each user a stern mail message.
Discussion
Let’s say we’ve used @bad_users
to compile a list of users who
are over their allotted disk quotas. To call some complain
subroutine for each user, we’d
use:
foreach $user (@bad_users) { complain($user); }
Rarely is this recipe so simply applied. Instead, we often use functions to generate the list:
foreach $var (sort keys %ENV) { print "$var=$ENV{$var}\n"; }
Here we’re using sort
and
keys
to build a sorted list of
environment variable names. If you use the list more than once, you’ll
obviously keep it around by saving in an array. But for one-shot
processing, it’s often tidier to process the list directly.
Not only can we add complexity to this formula by building up
the list in the foreach
, we can
also add complexity by doing more work inside the code block. A common
application of foreach
is to gather
information on every element of a list and then, based on that
information, decide whether to do something. For instance, returning
to the disk quota example:
foreach $user (@all_users) { $disk_space = get_usage($user); # find out how much disk space in use if ($disk_space > $MAX_QUOTA) { # if it's more than we want ... complain($user); # ... then object vociferously } }
More complicated program flow is possible. The code can call
last
to jump out of the loop,
next
to move on to the next
element, or redo
to jump back to
the first statement inside the block. Use these to say “no point
continuing with this one, I know it’s not what I’m looking for”
(next
), “I’ve found what I’m
looking for, there’s no point in my checking the rest” (last
), or “I’ve changed some things, I’d
better run this loop’s calculations again” (redo
).
The variable set to each value in the list is called a
loop variable or iterator
variable. If no iterator variable is supplied, the global
variable $_
is used. $_
is the default variable for many of
Perl’s string, list, and file functions. In brief code blocks,
omitting $_
improves readability.
(In long ones, though, too much implicit use hampers readability.) For
example:
foreach (`who`) { if (/tchrist/) { print; } }
or combining with a while
loop:
while (<FH>) { # $_ is set to the line just read chomp; # $_ has a trailing \n removed, if it had one foreach (split) { # $_ is split on whitespace, into @_ # then $_ is set to each chunk in turn $_ = reverse; # the characters in $_ are reversed print; # $_ is printed } }
Perhaps all these uses of $_
are starting to make you nervous. In particular, the foreach
and the while
both give values to $_
. You might fear that at the end of the
foreach
, the full line as read into
$_
with <FH>
would be forever gone.
Fortunately, your fears would be unfounded, at least in
this case. Perl won’t permanently clobber $_
’s old value, because the foreach
’s iterator variable (here, $_
) is automatically preserved during the
loop. It saves away any old value on entry and restores it upon
exit.
However, there is some cause for concern. If the while
had been the inner loop and the
foreach
the outer one, your fears
would have been realized. Unlike a foreach
loop, the while
(<FH>)
construct clobbers the value of
the global $_
without first
localizing it! So any routine—or block for that matter—that uses this
construct with $_
should declare
local
$_
.
If a lexical variable (one declared with my
) is in scope, the temporary variable will
be lexically scoped, private to that loop. Otherwise, it will be a
dynamically scoped global variable. To avoid strange magic at a
distance, write this more obviously and more clearly as:
foreach my $item (@array) { print "i = $item\n"; }
The foreach
looping
construct has another feature: each time through the loop, the
iterator variable becomes not a copy of but rather an
alias for the current element. This means that
when you change that iterator variable, you really change each element
in the list:
@array = (1,2,3); foreach $item (@array) { $item--; } print "@array\n"; 0 1 2 # multiply everything in @a and @b by seven @a = ( .5, 3 ); @b = ( 0, 1 ); foreach $item (@a, @b) { $item *= 7; } print "@a @b\n"; 3.5 21 0 7
You can’t change a constant, though, so this is illegal:
foreach $n (1, 2, 3) { $n **= 2; }
This aliasing means that using a foreach
loop to modify list values is both
more readable and faster than the equivalent code using a three-part
for
loop and explicit indexing
would be. This behavior is a feature, not a bug, that was introduced
by design. If you didn’t know about it, you might accidentally change
something. Now you know about it.
For example, to trim leading and trailing whitespace in a hash,
we take advantage of how the values
function works: the elements of its return list really
are the values of the hash, and changing these
changes the original hash. Because we use s///
directly on the list returned by the
values
function without copying
these into a variable, we change the real hash itself.
# trim whitespace in the scalar, the array, and in all # the values in the hash foreach ($scalar, @array, values %hash) { s/^\s+//; s/\s+$//; }
For reasons hearkening back to the equivalent construct in the
Unix Bourne shell, the for
and
foreach
keywords are
interchangeable:
for $item (@array) { # same as foreach $item (@array) # do something } for (@array) { # same as foreach $_ (@array) # do something }
This style often indicates that its author writes or maintains shell scripts, perhaps for Unix system administration. As such, their life is probably hard enough, so don’t speak too harshly of them. Remember, TMTOWTDI. This is just one of those ways.
If you aren’t fluent in Bourne shell, you might find it clearer
to express “for each $thing
in this
@list
" by saying foreach
, to make your code look less like
the shell and more like English. (But don’t try to make your English
look like your code!)
See Also
The “For Loops,” “Foreach Loops,” and “Loop Control” sections of
perlsyn(1) and Chapter 4 of
Programming Perl; the “Temporary Values via
local” section of perlsub(1); the
“Scoped Declarations” section of Chapter 4 of Programming
Perl; we talk about local
in Recipe 10.13; we talk about
my
in Recipe 10.2
4.6. Iterating Over an Array by Reference
Problem
You have a reference to an array, and you want to use a loop to work with the array’s elements.
Solution
Use foreach
or for
to loop over the dereferenced
array:
# iterate over elements of array in $ARRAYREF foreach $item (@$ARRAYREF) { # do something with $item } for ($i = 0; $i <= $#$ARRAYREF; $i++) { # do something with $ARRAYREF->[$i] }
Discussion
The solutions assume you have a scalar variable containing the array reference. This lets you do things like this:
@fruits = ( "Apple", "Blackberry" ); $fruit_ref = \@fruits; foreach $fruit (@$fruit_ref) { print "$fruit tastes good in a pie.\n"; } Apple tastes good in a pie Blackberry tastes good in a pie
We could have rewritten the foreach
loop as a for
loop like this:
for ($i=0; $i <= $#$fruit_ref; $i++) { print "$fruit_ref->[$i] tastes good in a pie.\n"; }
Frequently, though, the array reference is the result of a more
complex expression. Use the @{
EXPR
}
notation to turn the result of the
expression back into an array:
$namelist{felines} = \@rogue_cats; foreach $cat ( @{ $namelist{felines} } ) { print "$cat purrs hypnotically..\n"; } print "--More--\nYou are controlled.\n";
Again, we can replace the foreach
with a for
loop:
for ($i=0; $i <= $#{ $namelist{felines} }; $i++) { print "$namelist{felines}[$i] purrs hypnotically.\n"; }
See Also
perlref(1) and perllol(1); Chapter 8 of Programming Perl; Recipe 11.1; Recipe 4.5
4.7. Extracting Unique Elements from a List
Problem
You want to eliminate duplicate values from a list, such as when you build the list from a file or from the output of another command. This recipe is equally applicable to removing duplicates as they occur in input and to removing duplicates from an array you’ve already populated.
Solution
Use a hash to record which items have been seen, then keys
to extract them. You can use Perl’s
idea of truth to shorten and speed up your code.
Straightforward
%seen = ( ); @uniq = ( ); foreach $item (@list) { unless ($seen{$item}) { # if we get here, we have not seen it before $seen{$item} = 1; push(@uniq, $item); } }
Discussion
The question at the heart of the matter is “Have I seen this element before?” Hashes are ideally suited to such lookups. The first technique (Recipe 4.7.2.1) builds up the array of unique values as we go along, using a hash to record whether something is already in the array.
The second technique (Recipe 4.7.2.2) is the
most natural way to write this sort of thing in Perl. It creates a new
entry in the hash every time it sees an element that hasn’t been seen
before, using the ++
operator. This
has the side effect of making the hash record the number of times the
element was seen. This time we only use the hash for its property of
working like a set.
The third example (Recipe 4.7.2.3) is similar to the second but rather than storing the item away, we call some user-defined function with that item as its argument. If that’s all we’re doing, keeping a spare array of those unique values is unnecessary.
The next mechanism (Recipe 4.7.2.4) waits
until it’s done processing the list to extract the unique keys from
the %seen
hash. This may be
convenient, but the original order has been lost.
The final approach (Recipe 4.7.2.5) merges the
construction of the %seen
hash with
the extraction of unique elements. This preserves the original order
of elements.
Using a hash to record the values has two side effects:
processing long lists can take a lot of memory, and the list returned
by keys
is unordered.
Here’s an example of processing input as it is read. We use
`who`
to gather information on the
current user list, then extract the username from each line before
updating the hash:
# generate a list of users logged in, removing duplicates %ucnt = ( ); for (`who`) { s/\s.*\n//; # kill from first space till end-of-line, yielding username $ucnt{$_}++; # record the presence of this user } # extract and print unique keys @users = sort keys %ucnt; print "users logged in: @users\n";
See Also
The “Foreach Loops” section of perlsyn(1) and Chapter 4 of
Programming Perl; the keys
function in perlfunc(1) and Chapter 29 of
Programming Perl; the “Hashes” section of
Chapter 2 of Programming Perl; Chapter 5; we use hashes in a similar
fashion in Recipe 4.8
and Recipe 4.9
4.8. Finding Elements in One Array but Not Another
Solution
You want to find elements in @A
that aren’t in @B
. Build a hash of the keys of @B
to use as a lookup table. Then check each
element in @A
to see whether it is
in @B
.
Straightforward implementation
# assume @A and @B are already loaded %seen = ( ); # lookup table to test membership of B @aonly = ( ); # answer # build lookup table foreach $item (@B) { $seen{$item} = 1 } # find only elements in @A and not in @B foreach $item (@A) { unless ($seen{$item}) { # it's not in %seen, so add to @aonly push(@aonly, $item); } }
Discussion
As with nearly any problem in Perl that asks whether a
scalar is in one list or another, this one uses a
hash. First, process @B
so that the
%seen
hash records each element
from @B
by setting its value to 1.
Then process @A
one element at a
time, checking whether that particular element had been in @B
by consulting the %seen
hash.
The given code retains duplicate elements in @A
. This can be fixed easily by adding the
elements of @A
to %seen
as they are processed:
foreach $item (@A) { push(@aonly, $item) unless $seen{$item}; $seen{$item} = 1; # mark as seen }
The first two solutions differ mainly in how they build the
hash. The first iterates through @B
. The second uses a hash
slice to initialize the hash. A hash slice is easiest
illustrated by this example:
$hash{"key1"} = 1; $hash{"key2"} = 2;
which is equivalent to:
@hash{"key1", "key2"} = (1,2);
The list in the curly braces holds the keys; the list on the
right holds the values. We initialize %seen
in the first solution by looping over
each element in @B
and setting the
appropriate value of %seen
to 1. In
the second, we simply say:
@seen{@B} = ( );
This uses items in @B
as keys
for %seen
, setting each
corresponding value to undef
,
because there are fewer values on the right than places to put them.
This works out here because we check for existence of the key, not
logical truth or defined
ness of the
value. If we needed true values, a slice could still shorten our
code:
@seen{@B} = (1) x @B;
In the third solution, we make use of this property even further
and avoid explicit loops altogether. (Not that avoiding loops should
be construed as being particularly virtuous; we’re just showing you
that there’s more than one way to do it.) The slice assignment makes
any element that was in @A
a key,
and the slice deletion removes from the hash any keys that were
elements of @B
, leaving those that
were only in @A
.
A fairly common situation where this might arise is when you have two files and would like to know which lines from the second file either were or weren’t in the first. Here’s a simple solution based on this recipe:
open(OLD, $path1) || die "can't open $path1: $!"; @seen{ <OLD> } = ( ); open(NEW, $path2) || die "can't open $path2: $!"; while (<NEW>) { print if exists $seen{$_}; }
This shows the lines in the second file that were already seen
in the first one. Use unless
instead of if
to show the lines in
the second file that were not in the
first.
Imagine two files, the first containing the lines:
red yellow green blue
and the second containing:
green orange purple black yellow
The output using if
would
be:
green yellow
and the output using unless
would be:
orange purple black
You could even do this from the command line; given a suitable cat(1) program, it’s easy:
% perl -e '@s{`cat OLD`}=( ); exists $s{$_} && print for `cat NEW`' % perl -e '@s{`cat OLD`}=( ); exists $s{$_} || print for `cat NEW`'
You’d find that you just emulated these calls to the Unix
fgrep
(1) program:
% fgrep -Ff OLD NEW % fgrep -vFf OLD NEW
See Also
Hash slices are explained in perldata(1) and the “Variables” section of Chapter 2 of Programming Perl; Chapter 5; we use hashes in a similar fashion in Recipe 4.7 and Recipe 4.9
4.9. Computing Union, Intersection, or Difference of Unique Lists
Problem
You have a pair of lists, each holding unduplicated items. You’d like to find out which items are in both lists (intersection), one but not the other (difference), or either (union).
Solution
The following solutions need the listed initializations:
@a = (1, 3, 5, 6, 7, 8); @b = (2, 3, 5, 7, 9); @union = @isect = @diff = ( ); %union = %isect = ( ); %count = ( );
Simple solution for union and intersection
foreach $e (@a) { $union{$e} = 1 } foreach $e (@b) { if ( $union{$e} ) { $isect{$e} = 1 } $union{$e} = 1; } @union = keys %union; @isect = keys %isect;
More idiomatic version
foreach $e (@a, @b) { $union{$e}++ && $isect{$e}++ } @union = keys %union; @isect = keys %isect;
Discussion
The first solution most directly computes the union and intersection of two lists, neither containing duplicates. Two hashes are used to record whether a particular item goes in the union or the intersection. We put every element of the first array in the union hash, giving it a true value. Then, processing each element of the second array, we check whether that element is already present in the union. If it is, we put it in the intersection as well. In any event, it goes into the union. When we’re done, we extract the keys of both the union and intersection hashes. The values aren’t needed.
The second solution (Recipe 4.8.2.2) is
essentially the same but relies on familiarity with the Perl (and
awk, C, C++, and Java) ++
and &&
operators. By placing the
++
after the variable, we first
look at its old value before incrementing it. The first time through
it won’t be in the union, which makes the first part of the &&
false, so the second part is
consequently ignored. The second time that we encounter the same
element, it’s already present in the union, so we put it in the
intersection.
The third solution uses just one hash to track how many times each element is seen. Once both arrays have their elements recorded in the hash, we grab those keys and put them in the union. Then we process those hash keys one at a time. Keys whose values are 2 were in both arrays, so they go in the intersection array. Keys whose values are 1 were in just one of the two arrays, so they go in the difference array. Elements of the output arrays are not in the same order as those in the input arrays.
The last solution, like the previous one, uses just one hash to
count how many times each element is encountered. Here, though, we
dynamically select one of two possible arrays by placing within the
@{...}
array-dereferencing block an
expression whose evaluation yields a reference to whichever array is
demanded by the situation.
In this recipe we compute the symmetric difference, not
the simple difference. These are terms from set theory. A
symmetric difference is the set of all elements
that are members of either @A
or
@B
, but not both. A
simple difference is the set of members of
@A
but not @B
, which we calculated in Recipe 4.8.
See Also
The “Hashes” section of Chapter 2 of Programming Perl; Chapter 5; we use hashes in a similar fashion in Recipe 4.7 and Recipe 4.8
4.10. Appending One Array to Another
Discussion
The push
function is
optimized for appending a list to the end of an array. You can take
advantage of Perl’s list flattening to join two arrays, but this
results in significantly more copying than push
:
@ARRAY1 = (@ARRAY1, @ARRAY2);
Here’s an example of push
in
action:
@members = ("Time", "Flies"); @initiates = ("An", "Arrow"); push(@members, @initiates); # @members is now ("Time", "Flies", "An", "Arrow")
To insert the elements of one array into the middle of another,
use the splice
function:
splice(@members, 2, 0, "Like", @initiates); print "@members\n"; splice(@members, 0, 1, "Fruit"); splice(@members, -2, 2, "A", "Banana"); print "@members\n";
This is the output:
Time Flies Like An Arrow Fruit Flies Like A Banana
See Also
The splice
and push
functions in
perlfunc(1) and Chapter 29 of
Programming Perl; the “List Values and Arrays”
section of Chapter 2 of Programming Perl; the
“List Value Constructors” section of perldata(1)
4.11. Reversing an Array
Solution
# reverse @ARRAY into @REVERSED @REVERSED = reverse @ARRAY;
Or process with a foreach
loop on a reversed list:
foreach $element (reverse @ARRAY) { # do something with $element }
Or use a for
loop, starting
with the index of the last element and working your way down:
for ($i = $#ARRAY; $i >= 0; $i--) { # do something with $ARRAY[$i] }
Discussion
Called in list context, the reverse
function reverses elements of its
argument list. You can save a copy of that reversed list into an
array, or just use foreach
to walk
through it directly if that’s all you need. The for
loop processes the array elements in
reverse order by using explicit indices. If you don’t need a reversed
copy of the array, the for
loop can
save memory and time on very large arrays.
If you’re using reverse
to
reverse a list that you just sorted, you should have sorted it in the
correct order to begin with. For example:
# two-step: sort then reverse @ascending = sort { $a cmp $b } @users; @descending = reverse @ascending; # one-step: sort with reverse comparison @descending = sort { $b cmp $a } @users;
See Also
The reverse
function in
perlfunc(1) and Chapter 29 of
Programming Perl; we use reverse
in Recipe 1.7
4.12. Processing Multiple Elements of an Array
Solution
# remove $N elements from front of @ARRAY (shift $N) @FRONT = splice(@ARRAY, 0, $N); # remove $N elements from the end of the array (pop $N) @END = splice(@ARRAY, -$N);
Discussion
The splice
function allows
you to add elements, delete elements, or both, at any point in an
array, not just at the ends. All other operations that modify an
array’s length can also be written as a splice
:
Unlike pop
and unshift
, though, which always delete and
return just one element at a time—and from the ends only—splice
lets you specify the number of
elements. This leads to code like the examples in the Solution.
It’s often convenient to wrap these splice
s as functions:
sub shift2 (\@) { return splice(@{$_[0]}, 0, 2); } sub pop2 (\@) { return splice(@{$_[0]}, -2); }
This makes their behavior more apparent when you use them:
@friends = qw(Peter Paul Mary Jim Tim); ($this, $that) = shift2(@friends); # $this contains Peter, $that has Paul, and # @friends has Mary, Jim, and Tim @beverages = qw(Dew Jolt Cola Sprite Fresca); @pair = pop2(@beverages); # $pair[0] contains Sprite, $pair[1] has Fresca, # and @beverages has (Dew, Jolt, Cola)
The splice
function returns
the elements it removed from the array, so shift2
replaces the first two elements in
@ARRAY
with nothing (i.e., deletes
them) and returns the two elements deleted. In pop2
, the two elements at end of the array
are removed and returned.
These two functions are prototyped to take an array reference as
their argument to better mimic the built-in shift
and pop
functions. The caller doesn’t pass in an
explicit reference using a backslash. Instead, the compiler, having
seen the array reference prototype, arranges to pass the array by
reference anyway. Advantages to this approach include efficiency,
transparency, and compile-time parameter checking. One disadvantage is
that the thing passed in must look like a real array with a leading
@
sign, not just a scalar
containing an array reference. If it did, you’d have to prepend an
@
, making it less
transparent:
$line[5] = \@list; @got = pop2( @{ $line[5] } );
This is another example of where a proper array and not a mere
list is called for. The \@
prototype requires that whatever goes in that argument slot be an
array. $line[5]
isn’t an array, but
an array reference. That’s why we need the “extra” @
sign.
See Also
The splice
function in
perlfunc(1) and Chapter 29 of
Programming Perl; the “Prototypes” sections of
perlsub(1) and Chapter 6 of
Programming Perl; we use splice
in Recipe 4.10
4.13. Finding the First List Element That Passes a Test
Problem
You want the first element in the list (or its index) that passes a test. Alternatively, you want to know whether any element passes the test. The test can be simple identity (“Is this element in the list?”)[1] or more complex (“I have a list of Employee objects, sorted from highest salary to lowest. Which manager has the highest salary?”). Simple cases normally require only the value of the element, but when the array itself will be altered, you probably need to know the index number of the first matching element.
Solution
To find a matching value, use foreach
to loop over every element, and call
last
as soon as you find a
match:
my ($match, $found, $item);
foreach $item (@array) {
if (CRITERION
) {
$match = $item; # must save
$found = 1;
last;
}
}
if ($found) {
## do something with $match
} else {
## unfound
}
To find a matching index, use for
to loop a variable over every array
index, and call last
as soon as you
find a match:
my ($i, $match_idx);
for ($i = 0; $i < @array; $i++) {
if (CRITERION
) {
$match_idx = $i; # save the index
last;
}
}
if (defined $match_idx) {
## found in $array[$match_idx]
} else {
## unfound
}
The List::Util module, shipped standard with Perl as of v5.8 but available on CPAN for earlier versions, provides an even easier approach:
use List::Util qw(first);
$match = first { CRITERION
} @list
Discussion
Lacking (until recently) a built-in mechanism to do this, we
must write our own code to go through the list and test each element.
We use foreach
and for
, and call last
to ensure that we stop as soon as we
find a match. Before we use last
to
stop looking, though, we save the value or index.
A common approach is to try to use grep
here. But grep
always tests all elements and finds all
matches, so it’s inefficient if you want only the first match.
However, grep
might still be
faster. That’s because there will be less source code if you use
grep
rather than writing your own
loop. That means fewer internal Perl operations, and it is these that
in practice often dominate runtimes.
Beyond a certain size of your data set, a loop that terminates
early will still be faster—assuming it has the chance to do so.
Empirical evidence suggests that for
will be faster as long as you can exit
before the first two-thirds of the list has been examined. It’s
worthwhile to know how to do that.
We have to set $match
when we
want the value of the first matching element. We can’t just test
$item
at the end of the loop,
because foreach
automatically
local
izes the iterator variable and
thereby prevents us from accessing the final loop value after the loop
ends. See Recipe
4.5.
Here’s an example. Assume that @all_emps
holds a list of Employee objects,
sorted in descending order by salary. We wish to find the highest paid
engineer, who will be the first engineer in the array. We only want to
print the engineer’s name, so we want the value, not the index.
foreach $employee (@all_emps) { if ( $employee->category( ) eq 'engineer' ) { $top_engr = $employee; last; } } print "Highest paid engineer is: ", $highest_engineer->name( ), "\n";
When we’re searching and want only the index, we can save some
code by remembering that $i
will
not be an acceptable array index if we don’t find a match. This mainly
saves us code space, as not doing an assignment doesn’t really win
much compared to the time spent testing list elements. It’s more
obscure, because it tests if ($i <
@ARRAY)
to check whether we found a match, instead of the
more obvious defined
test in the
previous solution.
for ($i = 0; $i < @ARRAY; $i++) {
last if CRITERION
;
}
if ($i < @ARRAY) {
## found and $i is the index
} else {
## not found
}
The first
function
from List::Util encapsulates the logic from an entire loop into a
convenient, easy-to-use function. It acts just like a short-circuiting
form of the built-in grep
function
that stops as soon as a match is found. While running, each list
element is in a localized $_
variable. For example:
$first_odd = first { $_ % 2 = = 1 } @ARRAY;
Or rewriting the previous employee loop:
$top_engr = first { $_->category( ) eq 'engineer' } @all_emps;
4.14. Finding All Elements in an Array Matching Certain Criteria
Problem
From a list, you want only the elements that match certain criteria.
This notion of extracting a subset of a larger list is common. It’s how you find all engineers in a list of employees, all users in the “staff” group, or all the filenames you’re interested in.
Solution
Use grep
to apply a
condition to all elements in the list and return only those for which
the condition was true:
@MATCHING = grep { TEST ($_) } @LIST;
Discussion
This could also be accomplished with a foreach
loop:
@matching = ( ); foreach (@list) { push(@matching, $_) if TEST ($_); }
The Perl grep
function is
shorthand for all that looping and mucking about. It’s not really like
the Unix grep
command; it doesn’t
have options to return line numbers or to negate the test, and it
isn’t limited to regular-expression tests. For example, to filter out
just the large numbers from an array or to find out which keys in a
hash have very large values:
@bigs = grep { $_ > 1_000_000 } @nums; @pigs = grep { $users{$_} > 1e7 } keys %users;
Here’s something that sets @matching
to lines from the
who command that start with "gnat
“:
@matching = grep { /^gnat / } `who`;
Here’s another example:
@engineers = grep { $_->position( ) eq "Engineer" } @employees;
It extracts only those objects from the array @employees
whose position
method returns the string Engineer
.
You could have even more complex tests in a grep
:
@secondary_assistance = grep { $_->income >= 26_000 && $_->income < 30_000 } @applicants;
But at that point you may decide it would be more legible to write a proper loop instead.
See Also
The “For Loops,” “Foreach Loops,” and “Loop Control” sections of
perlsyn(1) and Chapter 4 of
Programming Perl; the grep
function in perlfunc(1) and Chapter 29 of
Programming Perl; your system’s who(1) manpage, if it exists; Recipe 4.13
4.15. Sorting an Array Numerically
Solution
Use Perl’s sort
function and the <=> numerical comparison operator:
@sorted = sort { $a <=> $b } @unsorted;
Discussion
The sort
function takes an
optional code block, which lets you replace the default alphabetic
comparison with your own subroutine. This comparison function is
called each time sort
has to
compare two values. The values to compare are loaded into the special
package variables $a
and $b
, which are automatically local
ized.
The comparison function should return a negative number if
$a
ought to appear before $b
in the output list, 0
if they’re the same and their order
doesn’t matter, or a positive number if $a
ought to appear after $b
. Perl has two operators that behave this
way: <=> for sorting numbers in ascending numeric order, and
cmp
for sorting strings in
ascending alphabetic order. By default, sort
uses cmp
-style comparisons.
Here’s code that sorts the list of PIDs in @pids
, lets the user select one, then sends
it a TERM signal followed by a KILL signal. We use a code block that
compares $a
to $b
with <=> to sort
numerically:
# @pids is an unsorted array of process IDs foreach my $pid (sort { $a <=> $b } @pids) { print "$pid\n"; } print "Select a process ID to kill:\n"; chomp ($pid = <>); die "Exiting ... \n" unless $pid && $pid =~ /^\d+$/; kill('TERM',$pid); sleep 2; kill('KILL',$pid);
If you use $a
<=>
$b
or $a
cmp
$b
, the list will be sorted in ascending
order. For a descending sort, all we have to do is swap $a
and $b
in the sort subroutine:
@descending = sort { $b <=> $a } @unsorted;
Comparison routines must be consistent; that is, they should always return the same answer when called with the same values. Inconsistent comparison routines lead to infinite loops or core dumps, especially in older releases of Perl.
You can also say sort
SUBNAME
LIST
where SUBNAME
is the name of a comparison
subroutine returning -1
, 0
, or +1
.
In the interests of speed, the normal calling conventions are
bypassed, and the values to be compared magically appear for the
duration of the subroutine in the global package variables $a
and $b
. Because of the odd way Perl calls this
subroutine, it may not be recursive.
A word of warning: $a
and
$b
are set in the package active in
the call to sort
, which may not be
the same as the one that the SUBNAME
function passed to sort
was compiled in! For example:
package Sort_Subs; sub revnum { $b <=> $a } package Other_Pack; @all = sort Sort_Subs::revnum 4, 19, 8, 3;
This will silently fail (unless you have -w in effect, in which case it will vocally
fail) because the sort
call sets
the package variables $a
and
$b
in its own package, Other_Pack
, but the revnum
function uses its own package’s
versions. This is another reason why in-lining sort functions is
easier, as in:
@all = sort { $b <=> $a } 4, 19, 8, 3;
For more on packages, see Chapter 10.
See Also
The cmp
and <=>
operators in
perlop(1) and Chapter 3 of Programming
Perl; the kill
,
sort
, and sleep
functions in perlfunc(1) and Chapter 29 of
Programming Perl; Recipe 4.16
4.16. Sorting a List by Computable Field
Problem
You want to sort a list by something more complex than a simple string or numeric comparison.
This is common when working with objects (“sort by the employee’s salary”) or complex data structures (“sort by the third element in the array that this is a reference to”). It’s also applicable when you want to sort by more than one key; for instance, sorting by birthday and then by name when multiple people share the same birthday.
Solution
Use the customizable comparison routine in sort
:
@ordered = sort { compare( ) } @unordered;
You can speed this up by precomputing the field.
@precomputed = map { [compute( ),$_] } @unordered; @ordered_precomputed = sort { $a->[0] <=> $b->[0] } @precomputed; @ordered = map { $_->[1] } @ordered_precomputed;
And, finally, you can combine the three steps:
@ordered = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [compute( ), $_] } @unordered;
Discussion
The use of a comparison routine was explained in Recipe 4.15. As well as using
built-in operators like <=>
,
you can construct more complex tests:
@ordered = sort { $a->name cmp $b->name } @employees;
You often see sort
used like
this in part of a foreach
loop:
foreach $employee (sort { $a->name cmp $b->name } @employees) { print $employee->name, " earns \$", $employee->salary, "\n"; }
If you’re going to do a lot of work with elements in a particular order, it’s more efficient to sort once and work from that:
@sorted_employees = sort { $a->name cmp $b->name } @employees; foreach $employee (@sorted_employees) { print $employee->name, " earns \$", $employee->salary, "\n"; } # load %bonus foreach $employee (@sorted_employees) { if ( $bonus{ $employee->ssn } ) { print $employee->name, " got a bonus!\n"; } }
We can put multiple comparisons in the routine and
separate them with ||
. ||
is a short-circuit operator: it returns
the first true value it finds. This means we can sort by one kind of
comparison, but if the elements are equal (the comparison returns 0),
we can sort by another. This has the effect of a sort within a
sort:
@sorted = sort { $a->name cmp $b->name || $b->age <=> $a->age } @employees;
This first considers the names of the two employees to be
compared. If they’re not equal, ||
stops and returns the result of the cmp
(effectively sorting them in ascending
order by name). If the names are equal, though, ||
keeps testing and returns the result of
the <=> (sorting them in descending order by age). The result is
a list that is sorted by name and by age within groups of the same
name.
Let’s look at a real-life example of sorting. First we fetch all system users, represented as User::pwent objects. Then we sort them by name and print the sorted list:
use User::pwent qw(getpwent); @users = ( ); # fetch all users while (defined($user = getpwent)) { push(@users, $user); } @users = sort { $a->name cmp $b->name } @users; foreach $user (@users) { print $user->name, "\n"; }
We can have more than simple comparisons, or combinations of
simple comparisons. This code sorts a list of names by comparing the
second letters of the names. It gets the second
letters by using substr
:
@sorted = sort { substr($a,1,1) cmp substr($b,1,1) } @names;
and here we sort by string length:
@sorted = sort { length $a <=> length $b } @strings;
The sort
function calls the
code block each time it needs to compare two elements, so the number
of comparisons grows dramatically with the number of elements we’re
sorting. Sorting 10 elements requires (on average) 46 comparisons, but
sorting 1,000 elements requires 14,000 comparisons. A time-consuming
operation like a split
or a
subroutine call for each comparison can easily make your program
crawl.
Fortunately, we can remove this bottleneck by running
the operation once per element prior to the sort. Use map
to store the results of the operation in
an array whose elements are anonymous arrays containing both the
computed field and the original field. Then we sort
this array of arrays on the precomputed
field and use map
to get the sorted
original data. This map
-sort
-map
concept is useful and common, so let’s look at it in depth.
Let’s apply map
-sort
-map
to the sorting by string length example:
@temp = map { [ length $_, $_ ] } @strings; @temp = sort { $a->[0] <=> $b->[0] } @temp; @sorted = map { $_->[1] } @temp;
The first line creates a temporary array of strings and their
lengths, using map
. The second line
sorts the temporary array by comparing the precomputed lengths. The
third line turns the sorted temporary array of strings and lengths
back into a sorted array of strings. This way, we calculate the length
of each string only once.
Because the input to each line is the output of the previous
line (the @temp
array we make in
line 1 is fed to sort
in line 2,
and that output is fed to map
in
line 3), we can combine it into one statement and eliminate the
temporary array:
@sorted = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ length $_, $_ ] } @strings;
The operations now appear in reverse order. When you meet a
map
-sort
-map
,
you should read it from the bottom up to determine the
function:
@strings
The last part is the data to be sorted. Here it’s just an array, but later we’ll see that this can be a subroutine or even backticks. Anything that returns a list is fair game.
map
The
map
closer to the bottom builds the temporary list of anonymous arrays. This list contains the precomputed fields (length
$_
) and also records the original element ($_
) by storing both in an anonymous array. Look at thismap
line to find out how the fields are computed.sort
The
sort
line sorts the list of anonymous arrays by comparing the precomputed fields. It won’t tell you much, other than whether the list is sorted in ascending or descending order.map
The
map
at the top of the statement turns the sorted list of anonymous arrays back into a list of the sorted original elements. It will generally be the same for everymap
-sort
-map
.
Here’s a more complicated example, which sorts by the first
number that appears on each line in @fields
:
@temp = map { [ /(\d+)/, $_ ] } @fields; @sorted_temp = sort { $a->[0] <=> $b->[0] } @temp; @sorted_fields = map { $_->[1] } @sorted_temp;
The regular expression mumbo jumbo in the first line extracts
the first number from the line being processed by map
. We use the regular expression /(\d+)/
in a list context to extract the
number.
We can remove the temporary arrays in that code, giving us:
@sorted_fields = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ /(\d+)/, $_ ] } @fields;
This final example compactly sorts colon-separated data, as from Unix’s passwd file. It sorts the file numerically by the fourth field (group id), then numerically by the third field (user id), and then alphabetically by the first field (username).
print map { $_->[0] } # whole line sort { $a->[1] <=> $b->[1] # gid || $a->[2] <=> $b->[2] # uid || $a->[3] cmp $b->[3] # login } map { [ $_, (split /:/)[3,2,0] ] } `cat /etc/passwd`;
See Also
The sort
function in
perlfunc(1) and Chapter 29 of
Programming Perl; the cmp
and <=>
operators in perlop(1) and Chapter 3 of
Programming Perl; Recipe 4.15
4.17. Implementing a Circular List
Procedure
unshift(@circular, pop(@circular)); # the last shall be first push(@circular, shift(@circular)); # and vice versa
Discussion
Circular lists are commonly used to repeatedly process things in order; for example, connections to a server. The code shown previously isn’t a true computer science circular list, with pointers and true circularity. Instead, the operations provide for moving the last element to the first position, and vice versa.
sub grab_and_rotate ( \@ ) { my $listref = shift; my $element = $listref->[0]; push(@$listref, shift @$listref); return $element; } @processes = ( 1, 2, 3, 4, 5 ); while (1) { $process = grab_and_rotate(@processes); print "Handling process $process\n"; sleep 1; }
See Also
The unshift
and push
functions in
perlfunc(1) and Chapter 29 of
Programming Perl; Recipe 13.13
4.18. Randomizing an Array
Problem
You want to randomly shuffle the elements of an array. The obvious application is writing a card game, where you must shuffle a deck of cards, but it is equally applicable to any situation where you want to treat elements of an array in a random order.
Solution
Use the shuffle
function from the standard List::Util module, which returns the
elements of its input list in a random order.
use List::Util qw(shuffle); @array = shuffle(@array);
Discussion
Shuffling is a surprisingly tricky process. It’s easy to write a bad shuffle:
sub naive_shuffle { # DON'T DO THIS for (my $i = 0; $i < @_; $i++) { my $j = int rand @_; # pick random element ($_[$i], $_[$j]) = ($_[$j], $_[$i]); # swap 'em } }
This algorithm is biased; the list’s possible permutations don’t all have the same probability of being generated. The proof of this is simple: take the case where we’re passed a three-element list. We generate three random numbers, each of which can have three possible values, yielding 27 possible outcomes. There are only six permutations of the three-element list, though. Because 27 isn’t evenly divisible by 6, some outcomes are more likely than others.
The List::Util module’s shuffle
function avoids this bias to produce
a more randomly shuffled result.
If all you want to do is pick one random element from the array, use:
$value = $array[ int(rand(@array)) ];
See Also
The rand
function in
perlfunc(1) and Chapter 29 of
Programming Perl; for more on random numbers,
see Recipe 2.6, Recipe 2.7, and Recipe 2.8; Recipe 4.20 provides another
way to select a random permutation
4.19. Program: words
Have you ever wondered how programs like ls generate columns of sorted output that you read down the columns instead of across the rows? For example:
awk cp ed login mount rmdir sum basename csh egrep ls mt sed sync cat date fgrep mail mv sh tar chgrp dd grep mkdir ps sort touch chmod df kill mknod pwd stty vi chown echo ln more rm su
Example 4-2 does this.
#!/usr/bin/perl -w # words - gather lines, present in columns use strict; my ($item, $cols, $rows, $maxlen); my ($xpixel, $ypixel, $mask, @data); getwinsize( ); # first gather up every line of input, # remembering the longest line length seen $maxlen = 1; while (<>) { my $mylen; s/\s+$//; $maxlen = $mylen if (($mylen = length) > $maxlen); push(@data, $_); } $maxlen += 1; # to make extra space # determine boundaries of screen $cols = int($cols / $maxlen) || 1; $rows = int(($#data+$cols) / $cols); # pre-create mask for faster computation $mask = sprintf("%%-%ds ", $maxlen-1); # subroutine to check whether at last item on line sub EOL { ($item+1) % $cols = = 0 } # now process each item, picking out proper piece for this position for ($item = 0; $item < $rows * $cols; $item++) { my $target = ($item % $cols) * $rows + int($item/$cols); my $piece = sprintf($mask, $target < @data ? $data[$target] : ""); $piece =~ s/\s+$// if EOL( ); # don't blank-pad to EOL print $piece; print "\n" if EOL( ); } # finish up if needed print "\n" if EOL( ); # not portable -- linux only sub getwinsize { my $winsize = "\0" x 8; my $TIOCGWINSZ = 0x40087468; if (ioctl(STDOUT, $TIOCGWINSZ, $winsize)) { ($rows, $cols, $xpixel, $ypixel) = unpack('S4', $winsize); } else { $cols = 80; } }
The most obvious way to print out a sorted list in columns is to print each element of the list, one at a time, padded out to a particular width. Then when you’re about to hit the end of the line, generate a newline. But that only works if you’re planning on reading each row from left to right. If you instead expect to read it down each column, this approach won’t do.
The words program is a filter that generates output going down the columns. It reads all input, keeping track of the length of the longest line seen. Once everything has been read in, it divides the screen width by the length of the longest input record seen, yielding the expected number of columns.
Then the program goes into a loop that executes once per input record, but the output order isn’t in the obvious order. Imagine you had a list of nine items:
Wrong Right ----- ----- 1 2 3 1 4 7 4 5 6 2 5 8 7 8 9 3 6 9
The words program does the necessary
calculations to print out elements (1,4,7)
on one line, (2,5,8)
on the next, and (3,6,9)
on the last.
To figure out the current window size, this program does an
ioctl
call. This works fine—on the
system it was written for. On any other system, it won’t work. If that’s
good enough for you, then good for you. Recipe 12.17 shows how to find
this on your system using the ioctl.ph file, or
with a C program. Recipe
15.4 shows a more portable solution, but that requires installing
a CPAN module.
See Also
4.20. Program: permute
Have you ever wanted to generate all possible permutations of an array or to execute some code for every possible permutation? For example:
% echo man bites dog | permute dog bites man bites dog man dog man bites man dog bites bites man dog man bites dog
The number of permutations of a set is the factorial of the size of the set. This number grows extremely fast, so you don’t want to run it on many permutations:
Set Size Permutations 1 1 2 2 3 6 4 24 5 120 6 720 7 5040 8 40320 9 362880 10 3628800 11 39916800 12 479001600 13 6227020800 14 87178291200 15 1307674368000
Doing something for each alternative takes a correspondingly large amount of time. In fact, factorial algorithms exceed the number of particles in the universe with very small inputs. The factorial of 500 is greater than ten raised to the thousandth power!
use Math::BigInt;
sub factorial {
my $n = shift;
my $s = 1;
$s *= $n-- while $n > 0;
return $s;
}
print factorial(Math::BigInt->new("500"));
+1220136... (1035 digits total)
The two solutions that follow differ in the order of the permutations they return.
The solution in Example 4-3 uses a classic list permutation algorithm used by Lisp hackers. It’s relatively straightforward but makes unnecessary copies. It’s also hardwired to do nothing but print out its permutations.
#!/usr/bin/perl -n # tsc_permute: permute each word of input permute([split], [ ]); sub permute { my @items = @{ $_[0] }; my @perms = @{ $_[1] }; unless (@items) { print "@perms\n"; } else { my (@newitems,@newperms,$i); foreach $i (0 .. $#items) { @newitems = @items; @newperms = @perms; unshift(@newperms, splice(@newitems, $i, 1)); permute( \@newitems, \@newperms); } } }
The solution in Example 4-4, provided by Mark-Jason Dominus, is faster (by around 25%) and more elegant. Rather than precalculate all permutations, his code generates the n th particular permutation. It is elegant in two ways. First, it avoids recursion except to calculate the factorial, which the permutation algorithm proper does not use. Second, it generates a permutation of integers rather than permute the actual data set.
He also uses a time-saving technique called
memoizing. The idea is that a function that always
returns a particular answer when called with a particular argument
memorizes that answer. That way, the next time it’s called with the same
argument, no further calculations are required. The factorial
function uses a private array
@fact
to remember previously
calculated factorial values as described in Recipe 10.3. This technique is
so useful that there’s a standard module that will handle the value
caching for you. If you just had a regular factorial function that
didn’t have its own caching, you could add caching to the existing
function this way:
use Memoize; memoize("factorial");
You call n2perm
with
two arguments: the permutation number to generate (from 0
to factorial(N)
, where N is the size of your
array) and the subscript of the array’s last element. The n2perm
function calculates directions for the
permutation in the n2pat
subroutine.
Then it converts those directions into a permutation of integers in the
pat2perm
subroutine. The directions
are a list like (0
2
0
1
0)
, which means: “Splice out the
0th element, then the second element from the
remaining list, then the 0th element, then
the first, then the 0th.”
#!/usr/bin/perl -w
# mjd_permute: permute each word of input
use strict;
sub factorial($); # forward reference to declare prototype
while (<>) {
my @data = split;
my $num_permutations = factorial(scalar @data);
for (my $i=0; $i < $num_permutations; $i++) {
my @permutation = @data[n2perm($i, $#data)];
print "@permutation\n";
}
}
# Utility function: factorial with memoizing
BEGIN {
my @fact = (1);
sub factorial($) {
my $n = shift;
return $fact[$n] if defined $fact[$n];
$fact[$n] = $n * factorial($n - 1);
}
}
# n2pat($N, $len) : produce the $N-th pattern of length $len
sub n2pat {
my $i = 1;
my $N = shift;
my $len = shift;
my @pat;
while ($i <= $len + 1) { # Should really be just while ($N) { ...
push @pat, $N % $i;
$N = int($N/$i);
$i++;
}
return @pat;
}
# pat2perm(@pat) : turn pattern returned by n2pat( )
into
# permutation of integers. XXX: splice is already O(N)
sub pat2perm {
my @pat = @_;
my @source = (0 .. $#pat);
my @perm;
push @perm, splice(@source, (pop @pat), 1) while @pat;
return @perm;
}
# n2perm($N, $len) : generate the Nth permutation of $len objects
sub n2perm {
pat2perm(n2pat(@_));
}
See Also
unshift
and splice
in perlfunc(1)
or Chapter 29 of Programming Perl; the sections
discussing closures in perlsub(1) and
perlref(1) and Chapter 8 of Programming
Perl; Recipe
2.6; Recipe 10.3
Get Perl Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.