Using the uca with Perl’s sort
In real code, the sort
built-in is usually called in one of two ways. Either it’s
called with no sort routine at all, or it’s called with a block
argument that serves as the custom comparison function. The Unicode::Collate’s sort method is a
fine substitute for the first flavor, but not the second. For that,
you’d use a different method from your collator object, called
getSortKey.
Suppose you have a program that uses the built-in sort, like this:
@srecs = sort {
$b–>{AGE} <=> $a–>{AGE}
||
$a–>{NAME} cmp $b–>{NAME}
} @recs;But then you decide you want the text to sort alphabetically
on your NAME fields, not just by
numeric codepoints. To do this, just ask the collator object to give
you back the binary sort key for each text string you will
eventually wish to sort. Unlike the regular text, if you pass this
binary sort key to the cmp
operator, it will magically sort in the order you want.
The block you pass to sort
now looks like this:
my $collator = Unicode::Collate–>new();
for my $rec (@recs) {
$rec–>{NAME_key} = $collator–>getSortKey( $rec–>{NAME} );
}
@srecs = sort {
$b–>{AGE} <=> $a–>{AGE}
||
$a–>{NAME_key} cmp $b–>{NAME_key}
} @recs;You can pass the constructor any optional arguments to do anything special, including preprocessing.
Another thing you can do with collator objects is use them to do simple accent- and case-insensitive matching. It makes sense; if you have the ability to tell when things are ordered, you also have the ability to tell when ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access