Cover | Table of Contents | Colophon
We developed and support a large scale telephone call logging and analysis system for a major client of ours. The system collects ~1 GB of call data per day from over 1,200,000 monitored phone numbers. ~424 GB has been processed so far (over 6,200,000,000 calls). Data is processed and loaded into Oracle using DBI and DBD::Oracle. The database holds rolling data for around 20 million calls. The system generates over 44,000 PostScript very high quality reports per month (~five pages with eleven color graphs and five tables) generated by using Perl to manipulate FrameMaker templates. [Values correct as of July 1999, and rising steadily.]The whole system runs on three dual processor Sun SPARC Ultra 2 machines—one for data acquisition and processing, one for Oracle and the third does most of the report production (which is also distributed across the other two machines). Almost the entire system is implemented in Perl.There is only one non-Perl program and that's only because it existed already and isn't specific to this system. The other non-Perl code is a few small libraries linked into Perl using the XS interface.A quote from a project summary by a senior manager: "Less than a year later the service went live. This was subsequently celebrated as one of the fastest projects of its size and complexity to go from conception to launch."Designed, developed, implemented, installed, and supported by the Paul Ingram Group, who received a "Rising to the Challenge" award for their part in the project. Without Perl, the system could not have been developed fast enough to meet the demanding go-live date. And without Perl, the system could not be so easily maintained or so quickly extended to meet changing requirements.Tim Bunce, Paul Ingram Group
In 1997 I built a system for NASA's Langley Research Center in Virginia that puts a searchable web front end on a database of about 100,000 NASA-owned equipment items. I used Apache, DBI, Informix, WDB, and mod_perl on a Sparc 20. Ran like a charm. They liked it so much they used it to give demos at meetings on reorganizing the wind tunnels! Thing was, every time they showed it to people, I ended up extending the system to add something new, like tracking equipment that was in for repairs, or displaying GIFs of technical equipment so when they lost the spec sheet, they could look it up online. When it works, success feeds on itself.
SELECT keyword used to fetch data from a database.: ), the tab, and the pipe symbol (
| ). The Unix /etc/passwd
file is a good example of a delimited file with each record being
separated by a colon. Figure 2.1 shows a single
record from an /etc/passwd file.
Data::Dumper and Storable, and
see how we can use them to put some fizz into our flat files. These
techniques are also applicable to storing complex Perl data
structures in relational databases using the DBI, so pay attention.Data::Dumper module takes a list of Perl variables
and writes their values out in the form of Perl
code, which will recreate the original values, no matter
how complex, when executed.eval()
or do().#!/usr/bin/perl -w
#
# ch02/marshal/datadumpertest: Creates some Perl variables and dumps them out.
# Then, we reset the values of the variables and
# eval the dumped ones ...
use Data::Dumper;
### Customise Data::Dumper's output style
### Refer to Data::Dumper documentation for full details
if ($ARGV[0] eq 'flat') {
$Data::Dumper::Indent = 0;
$Data::Dumper::Useqq = 1;
}
$Data::Dumper::Purity = 1;
### Create some Perl variables
my $megalith = 'Stonehenge';
my $districts = [ 'Wiltshire', 'Orkney', 'Dorset' ];
### Print them out
print "Initial Values: \$megalith = " . $megalith . "\n" .
" \$districts = [ ". join(", ", @$districts) . " ]\n\n";
### Create a new Data::Dumper object from the database
my $dumper = Data::Dumper->new( [ $megalith, $districts ],
[ qw( megalith districts ) ] );
### Dump the Perl values out into a variable
my $dumpedValues = $dumper->Dump();
### Show what Data::Dumper has made of the variables!
print "Perl code produced by Data::Dumper:\n";
print $dumpedValues . "\n";
### Reset the variables to rubbish values
$megalith = 'Blah! Blah!';
$districts = [ 'Alderaan', 'Mordor', 'The Moon' ];
### Print out the rubbish values
print "Rubbish Values: \$megalith = " . $megalith . "\n" .
" \$districts = [ ". join(", ", @$districts) . " ]\n\n";
### Eval the file to load up the Perl variables
eval $dumpedValues;
die if $@;
### Display the re-loaded values
print "Re-loaded Values: \$megalith = " . $megalith . "\n" .
" \$districts = [ ". join(", ", @$districts) . " ]\n\n";
exit;flock()
function. flock() implements a cooperative system
of locking that must be used by all programs attempting to access a
given file if it is to be effective. This includes read-only scripts,
such as the query script listed previously, which can use
flock() to test whether or not it is safe to
attempt a read on the database.Fcntl
package and can be imported into your
scripts for use with NDBM_File
,
DB_File
,
GDBM_File
,
SDBM_File
, and
ODBM_File
extensions. There's also an
AnyDBM_File
module that will simply use the best
available DBM. The documentation for the
AnyDBM_File module includes a useful table
comparing the different DBMs.DB_File and Berkeley DB extensions. On Windows
systems, it can be installed via the Perl package manager,
ppm. On Unix systems, it is built by default
when Perl is built only if the Berkeley DB
library has already been installed on your system. That's
generally the case on Linux, but on most other systems you may need
to fetch and build the Berkeley DB library first.
DB_File module also provide support for several
different storage and retrieval algorithms that can be used in subtly
different situations. In newer versions of the software, concurrent
access to databases and locking are also supported.MLDBM module is very
useful for quickly writing complex Perl data structures to DBM files
for persistent storage. The ML in
MLDBM stands for multilevel
and refers to its ability to store complex multilevel data
structures. That's something that ordinary hashes, even hashes
tied to DBM files, can't do.MLDBM module is an
excellent example of a layered storage manager. It acts as a thin
layer over another DBM module, but intercepts reads and writes to
automatically
serialize
(or deserialize) the data using another module.
MLDBM instead of your existing
DBM module.DB_File
for storage and
Data::Dumper
for displaying the restored data:#!/usr/bin/perl -w
#
# ch02/mldbmtest: Demonstrates storing complex data structures in a DBM
# file using the MLDBM module.
use MLDBM qw( DB_File Data::Dumper );
use Fcntl;
### Remove the test file in case it exists already ...
unlink 'mldbmtest.dat';
tie my %database1, 'MLDBM', 'mldbmtest.dat', O_CREAT | O_RDWR, 0666
or die "Can't initialize MLDBM file: $!\n";
### Create some megalith records in the database
%database1 = (
'Avebury' => {
name => 'Avebury',
mapref => 'SU 103 700',
location => 'Wiltshire'
},
'Ring of Brodgar' => {
name => 'Ring of Brodgar',
mapref => 'HY 294 133',
location => 'Orkney'
}
);
### Untie and retie to show data is stored in the file
untie %database1;
tie my %database2, 'MLDBM', 'mldbmtest.dat', O_RDWR, 0666
or die "Can't initialize MLDBM file: $!\n";
### Dump out via Data::Dumper what's been stored ...
print Data::Dumper->Dump( [ \%database2 ] );
untie %database2;
exit;megaliths table. Each of these items of
data would be a separate column.megaliths table. Each of these items of
data would be a separate column.Site Location Type Map Reference ---- -------- ---- ------------- Callanish I Western Isles Stone Circle and Rows NB 213 330 Stonehenge Wiltshire Stone Circle and Henge SU 123 422 Avebury Wiltshire Stone Circle and Henge SU 103 700 Sunhoney Aberdeenshire Recumbent Stone Circle NJ 716 058 Lundin Links Fife Four Poster NO 404 027
mapref field
within the megaliths table would not be much use
if it could hold only dates!FLOAT
,
REAL
,
INTEGER
, and
NUMBER
. Numeric datatypes are compared in the
obvious way; that is, the actual values are tested.10" will be less than
"9".CHAR
,
VARCHAR
,
VARCHAR2
, and so on. Most databases support at
least the most basic of these, CHAR.SELECT
keyword
to specify the columns to fetch, the tables to fetch them from, and
any conditions that must be met for the rows to be retrieved.
SELECT falls into the group of commands known as
Data Manipulation
Language
, or
DML, commands.SELECT can be intimidating to
the new user, primarily because it sports a multitude of different
ways in which the query can be customized. For example, you might
wish to return only unique data rows, group certain rows together, or
even specify how the returned rows should be sorted.SELECT syntax for this
form of query can be expressed as simply as:SELECT column, column, ..., column FROM table
SELECT * FROM table
megaliths table, the following SQL statement can
be used:SELECT name, location, mapref FROM megaliths
+---------------------------------------------------------------+ | name | location | mapref | +---------------------------------------------------------------+ | Callanish I | Callanish, Isle of Lewis | NB 213 330 | | Lundin Links | Lundin Links, Fife, Scotland | NO 404 027 | | Stonehenge | Near Amesbury, Wiltshire, England | SU 123 400 | | Avebury | Avebury, Wiltshire, England | SU 103 700 | | Sunhoney | Near Insch, Aberdeenshire | NJ 716 058 | +---------------------------------------------------------------+
SELECT data from them) are
very useful. Data
warehouses
are typically massive
read-only databases populated with
archived data mangled into a form suitable for reporting. However, in
the cut-and-thrust world of transaction-processing databases, the
ability to modify data within the database
quickly and efficiently is of paramount importance.SELECT.INSERT
keyword
provides a simple mechanism for inserting new rows of data into the
database. For example, assuming the megaliths
table is already present in the database and and contains the six
columns shown earlier in Figure 3.1, a single row
of data can be inserted into it using the following SQL statement:INSERT INTO megaliths VALUES ( 0, 'Callanish I',
'"Stonehenge of the North"',
'Western Isles',
'NB 213 330', 1 )CREATE
TABLE command, the
syntax of which varies depending on the database platform being used.
However, this statement generally specifies the name of the table to
be created and the definition of all the columns of the table (both
names and datatypes).megaliths table within our database was:CREATE TABLE megaliths (
id INTEGER NOT NULL,
name VARCHAR(64),
location VARCHAR(64),
description VARCHAR(256),
site_type_id INTEGER,
mapref VARCHAR(16)
)CREATE TABLE will create a brand-new table with the
given definition, which will be completely empty until you insert
rows into it.megaliths table, we can
issue the SQL statement of:DROP TABLE megaliths
data_sources()
,
to enumerate what can be connected to, and
connect()
,
to actually make a connection. These methods are more commonly
invoked as DBI class methods, however, which we will discuss in more
detail later in this chapter.$drh
.dbi:, much like a URL begins with
http:, and then the name of the driver, followed
by another colon—for example, dbi:Oracle:.
Any text that follows is passed to the driver's own
connect()
method to interpret as it sees fit. Most drivers expect either a
simple database name or, more often, a set of one or more
name/value pairs separated with
semicolons. Some common examples are listed later in this section.DBI->available_drivers()
method.
This returns a list with each element containing the data source
prefix of an installed driver, such as
dbi:Informix:.DBI->data_sources()
method
against one or more of the drivers returned by the
DBI->available_drivers() method to enumerate
which data sources are known to the driver. Calling the data_sources() method will
actually load the specified driver and validate that it is completely
and correctly installed. Because DBI dies if it can't load and
initialize a driver, this method should be called inside an
warn()
or