You want to sort a list by something more complex than a simple string or numeric comparison.
This is common when working with objects (“sort by the employee’s salary”) or complex data structures (“sort by the third element in the array that this is a reference to”). It’s also applicable when you want to sort by more than one key, for instance, sorting by birthday and then by name when multiple people have the same birthday.
Use the customizable comparison routine in sort
:
@ordered = sort { compare() } @unordered;
You can speed this up by precomputing the field.
@precomputed = map { [compute(),$_] } @unordered; @ordered_precomputed = sort { $a->[0] <=> $b->[0] } @precomputed; @ordered = map { $_->[1] } @ordered_precomputed;
And, finally, you can combine the three steps:
@ordered = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [compute(), $_] } @unordered;
The use of a comparison routine was explained in Section 4.14. As well as using built-in operators like <=>, you can construct more complex tests:
@ordered = sort { $a->name cmp $b->name } @employees;
You often see sort
used like this in part of a
foreach
loop:
foreach $employee (sort { $a->name cmp $b->name } @employees) { print $employee->name, " earns \$", $employee->salary, "\n"; }
If you’re going to do a lot of work with elements in a particular order, it’s more efficient to sort it once and work from that:
@sorted_employees = sort { $a->name cmp $b->name } @employees; foreach $employee (@sorted_employees) { print $employee->name, " earns \$", $employee->salary, "\n"; } # load %bonus foreach $employee (@sorted_employees) { if ( $bonus{ $employee->ssn } ) { print $employee->name, " got a bonus!\n"; } }
We can put multiple comparisons in the routine and separate them with
||
. ||
is a short-circuit
operator: it returns the first true (non-zero) value it finds. This
means we can sort by one kind of comparison, but if the elements are
equal (the comparison returns 0) we can sort by another. This has the
effect of a sort within a sort:
@sorted = sort { $a->name cmp $b->name || $b->age <=> $a->age } @employees;
This first considers the names of the two employees to be compared.
If they’re not equal, ||
stops and returns
the result of the cmp
(effectively sorting them in
ascending order by name). If the names are equal, though,
||
keeps testing and returns the result of the
<=> (sorting them in descending order by age). The result is a
list that is sorted by name and by age within groups of the same
name.
Let’s look at a real-life example of sorting. Here we fetch all users, as User::pwent objects. Then we sort them by name and print the sorted list:
use User::pwent qw(getpwent); @users = (); # fetch all users while (defined($user = getpwent)) { push(@users, $user); } @users = sort { $a->name cmp $b->name } @users; foreach $user (@users) { print $user->name, "\n"; }
We can have more than simple comparisons, or combinations of simple
comparisons. This code sorts a list of names by comparing the
second letters of the names. It gets the second
letters by using substr
:
@sorted = sort { substr($a,1,1) cmp substr($b,1,1) } @names;
and here we sort by length of the strings:
@sorted = sort { length $a <=> length $b } @strings;
The sort
function calls the code block each time
it needs to compare two elements, and the number of comparisons grows
dramatically with the number of elements we’re sorting. Sorting
10 elements requires (on average) 46 comparisons, but sorting 1,000
elements requires 14,000 comparisons. A time-consuming operation like
a split
or a subroutine call for each comparison
can easily make your program crawl.
Fortunately, we can remove this bottleneck by running the operation
once per element prior to the sort. Use
map
to store the results of the operation in an
array whose elements are anonymous arrays containing both the
computed field and the original field. Then we
sort
this array of arrays on the precomputed
field, and use map
to get the sorted original
data. This
map
-sort
-map
concept is useful and common, so let’s look at it in more
depth.
Let’s apply
map
-sort
-map
to the sorting by string length example:
@temp = map { [ length $_, $_ ] } @strings; @temp = sort { $a->[0] <=> $b->[0] } @temp; @sorted = map { $_->[1] } @temp;
The first line creates a temporary array of strings and their
lengths, using map
. The second line sorts the
temporary array by comparing the precomputed lengths. The third line
turns the sorted temporary array of strings and lengths back into a
sorted array of strings. This way we calculated the length of each
string only once.
Because the input to each line is the output of the previous line
(the @temp
array we make in line 1 is fed to
sort
in line 2, and that output is fed to
map
in line 3), we can combine it into one
statement and eliminate the temporary array:
@sorted = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ length $_, $_ ] } @strings;
The operations now appear in reverse order. When you meet a
map
-sort
-map
,
you should read it from the bottom up to determine the function:
-
@strings
The last part is the data to be sorted. Here it’s just an array, but later we’ll see that this can be a subroutine or even backticks. Anything that returns a list to be sorted is fair game.
-
map
The
map
closest to the bottom builds the temporary list of anonymous arrays. This list contains the precomputed fields (length
$_
) and also records the original element ($_
) by storing them both in an anonymous array. Look at thismap
line to find out how the fields are computed.-
sort
The
sort
line sorts the list of anonymous arrays by comparing the precomputed fields. It won’t tell you much, other than whether the list is sorted in ascending or descending order.-
map
The
map
at the top of the statement turns the sorted list of anonymous arrays back into a list of the sorted original elements. It will generally be the same for everymap
-sort
-map
.
Here’s a more complicated example, which sorts by the first
number that appears on each line in @fields
:
@temp = map { [ /(\d+)/, $_ ] } @fields; @sorted_temp = sort { $a->[0] <=> $b->[0] } @temp; @sorted_fields = map { $_->[1] } @sorted_temp;
The regular expression mumbo-jumbo in the first line extracts the
first number from the line being processed by map
.
We use the regular expression /(\d+)/
in a list
context to extract the number.
We can remove the temporary arrays in that code, giving us:
@sorted_fields = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ /(\d+)/, $_ ] } @fields;
This final example compactly sorts colon-separated data, as from Unix’s passwd file. It sorts the file numerically by fourth field (group id), then numerically by the third field (user id), and then alphabetically by the first field (user name).
print map { $_->[0] } # whole line sort { $a->[1] <=> $b->[1] # gid || $a->[2] <=> $b->[2] # uid || $a->[3] cmp $b->[3] # login } map { [ $_, (split /:/)[3,2,0] ] } `cat /etc/passwd`;
This compact,
map
-sort
-map
technique is more reminiscent of the functional world of Lisp and
Scheme programming than Perl’s normal C and
awk heritage. Because it was first pointed out
by Randal
Schwartz, this black art is often referred to as the
Schwartzian Transform.
The sort
function in perlfunc
(1) and Chapter 3 of Programming Perl; the cmp
and
<=>
operators in perlop
(1) and Chapter 2 of Programming Perl
; Section 4.14
Get Perl Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.