Sorting Large DBM Files
Problem
You want to process a large dataset you’d like to commit to a DBM file in a particular order.
Solution
Use the DB_File’s B-tree bindings and supply a comparison function of your own devising:
use DB_File;
# specify the Perl sub to do key comparison using the
# exported $DB_BTREE hash reference
$DB_BTREE->{'compare'} = sub {
my ($key1, $key2) = @_ ;
"\L$key1" cmp "\L$key2" ;
};
tie(%hash, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE)
or die "can't tie $filename: $!";Description
An annoyance of hashes, whether in memory or as DBM files, is that they do not maintain proper ordering. The CPAN module Tie::IxHash can make a regular hash in memory maintain its insertion order, but that doesn’t help you for DBM databases or arbitrary sorting criteria.
The DB_File module supports a nice solution to this using a
B-tree implementation. One advantage
of a B-tree over a regular DBM hash is its ordering. When the user
defines a comparison function, all calls to keys,
values, and each are
automatically ordered. For example, Example 14.4 is a
program that maintains a hash whose keys will always be sorted
case-insensitively.
Example 14-4. sortdemo
#!/usr/bin/perl
# sortdemo - show auto dbm sorting use strict; use DB_File; $DB_BTREE->{'compare'} = sub { my ($key1, $key2) = @_ ; "\L$key1" cmp "\L$key2" ; }; my %hash; my $filename = '/tmp/sorthash.db'; tie(%hash, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE) or die "can't tie $filename: $!"; my $i = 0; ...Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access