We begin our journey through Perl by taking a little stroll. This stroll presents a number of different features by hacking on a small application. The explanations here are extremely brief—each subject area is discussed in much greater detail later in this book. But this little stroll should give you a quick taste for the language, and you can decide if you really want to finish this book instead of reading some more Usenet news or running off to the ski slopes.
Let’s look at a little program that actually does something. Here is your basic “Hello, world” program (use any text editor to type it in):
print ("Hello, world!\n");
This single line is the entire program. The built-in function
print
starts it off, and in this case has just one
argument, a C-like text string. Within this string, the character
combination \n
stands for a newline character,
just as it does in C. The print
statement is
terminated by a semicolon (;
). As in Pascal or C,
all simple statements in Perl are terminated by a
semicolon.[7]
When you invoke this program, the Perl interpreter parses the entire
program and then executes the compiled form. The first and only
operation is the execution of the print
function,
which sends any arguments to the standard output. After the program
has completed, the Perl process exits, returning a successful exit
code to the parent process.
Soon, you’ll see Perl programs in which
print
and other functions are sometimes called
with parentheses, and sometimes called without them. The rule is
simple: in Perl, parentheses for built-in functions are never
required nor forbidden. Their use can help or hinder clarity, so use
your own judgment.
Let’s add a bit more sophistication. The Hello, world
greeting is a touch cold and inflexible. Let’s
have the program call you by your name. To do this, we need a place
to hold the name, a way to ask for the name, and a way to get a
response.
One kind of place to hold values (like a name) is a
scalar
variable. For this program, we’ll use the scalar
variable $name
to hold your name. In Chapter 2, we’ll go into more detail about what
these variables can hold, and what you can do with them. For now,
assume that you can hold a single number or string (sequence of
characters) in a scalar variable.
The program needs to ask for the name. To do that, we need a way to
prompt
and a way to accept input. The previous program showed us how to
prompt: use the print
function. And the way to get
a line from the terminal is with the
<STDIN>
construct, which
(as we’re using it here) grabs one line of input. We assign
this input to the $name
variable. This gives us
the following program:
print "What is your name? "; $name = <STDIN>;
The value of $name
at this point has a terminating
newline (Erik
comes in as
Erik\n
). To get rid of the newline, we use the
chomp()
function, which takes a scalar variable as its sole argument and
removes the trailing newline, if present, from the string:
chomp $name;
Now, all we need to do is say Hello,
followed by
the value of the $name
variable, which we can do
by embedding the variable inside the
quoted string:
print "Hello, $name!\n";
Putting it all together, we get:
print "What is your name? "; $name = <STDIN>; chomp $name; print "Hello, $name!\n";
Now, let’s say we have a special greeting for Erik, but want an
ordinary greeting for anyone else. To do this, we need to compare the
name that was entered with the string Erik
, and if
they are identical, do something special. Let’s add a C-like
if-then-else
branch and a comparison to the program:
print "What is your name? "; $name = <STDIN>; chomp $name; if ($name eq "Erik") { print "Hello, Erik! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting }
The
eq
operator compares two
strings. If they are equal (character for character, and of the same
length), the result is true. (No comparable operator[8] exists in C or C++.)
The if
statement selects which
block
of statements (between matching curly braces) is executed—if
the expression is true, it’s the first block, otherwise
it’s the second block.
Well, now that we have the name, let’s have the person running the program guess a secret word. For everyone except Erik, we’ll have the program repeatedly ask for guesses until the person guesses properly. First the program, and then an explanation:
$secretword = "gecko"; # the secret word print "What is your name? "; $name = <STDIN>; chomp $name; if ($name eq "Erik") { print "Hello, Erik! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting print "What is the secret word? "; $guess = <STDIN>; chomp $guess; while ($guess ne $secretword) { print "Wrong, try again. What is the secret word? "; $guess = <STDIN>; chomp $guess; } }
First, we define the secret word by putting it into another scalar
variable, $secretword
. After the greeting, the
(non-Erik) person is asked (with another print
)
for the guess. The guess is compared with the secret word using the
ne
operator, which
returns true if the strings are not equal (ne
is
the logical opposite of the eq
operator). The
result of the comparison controls a
while
loop, which executes the block as long as the comparison is true.
Of course, this program is not very secure, because anyone who is tired of guessing can merely interrupt the program and get back to the prompt, or even look at the source to determine the word. But, we weren’t trying to write a security system, just an example for this book.
Let’s see how we can modify this program to allow more than one valid secret word. Using what we’ve already seen, we could compare the guess repeatedly against a series of good answers stored in separate scalar variables. However, such a list would be hard to modify or read in from a file or compute based on the day of the week.
A
better solution is to store all of the possible answers in a data
structure called a list, or (preferrably) an
array. Each element of the
array is a separate scalar variable that can be independently set or
accessed. The entire array can also be given a value in one fell
swoop. We can assign a value to the entire array named
@words
so that it contains three possible good
passwords:
@words = ("camel","gecko","alpaca");
Array variable names begin with
@
, so they are
distinct from scalar variable names. Another way to write this so
that we don’t have to put all those quotemarks there is with
the
qw()
syntax, like
so:
@words = qw(camel gecko alpaca);
These mean exactly the same thing; the qw
makes it
as if we had quoted each of three strings.
After the array is assigned, we can access
each element by using a subscript reference (subscripts start
at zero). So, $words[0]
is
camel
, $words[1]
is
gecko
, and $words[2]
is
alpaca
. The subscript can be an expression as
well, so if we set $i
to 2, then
$words[$i]
is alpaca
.
(Subscript references start with $
rather than
@
, because they refer to a single element of the
array rather than the whole array.) Going back to our previous
example:
@words = qw(camel gecko alpaca); print "What is your name? "; $name = <STDIN>; chomp $name; if ($name eq "Erik") { print "Hello, Erik! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting print "What is the secret word? "; $guess = <STDIN>; chomp ($guess); $i = 0; # try this word first $correct = "maybe"; # is the guess correct or not? while ($correct eq "maybe") { # keep checking til we know if ($words[$i] eq $guess) { # right? $correct = "yes"; # yes! } elsif ($i < 2) { # more words to look at? $i = $i + 1; # look at the next word next time } else { # no more words, must be bad print "Wrong, try again. What is the secret word?"; $guess = <STDIN>; chomp ($guess); $i = 0; # start checking at the first word again } } # end of while not correct } # end of "not Erik"
You’ll notice we’re using the scalar variable
$correct
to indicate that we are either still
looking for a good password, or that we’ve found one.
This program also shows the
elsif
block of the if-then-else
statement. This exact
construct is not present in all programming
languages—it’s an abbreviation of the
else
block together with a new
if
condition, but it does not nest inside yet
another pair of curly braces. It’s a very Perl-like thing to
compare a set of conditions in a cascaded
if-elsif-elsif-elsif-else
chain. Perl
doesn’t really have the equivalent of C’s
switch
or Pascal’s case
statement, although you can build one yourself without too much
trouble. See Chapter 2 of Programming Perl or
the perlsyn documentation for details.
In the previous program, any person who comes along could guess any of the three words and be successful. If we want the secret word to be different for each person, we’ll need a table that matches people with words. Table 1.1 does just this.
Table 1-1. Matching Persons to Secret Words
Person |
Secret Word |
---|---|
Fred |
camel |
Barney |
gecko |
Betty |
alpaca |
Wilma |
alpaca |
Notice that both Betty and Wilma have the same secret word. This is fine.
The easiest way to store such a table in Perl is with a
hash.
Each element of the hash holds a separate scalar value (just like the
other type of array), but each hash is referenced by a
key, which can be any
scalar value (any string or number, including noninteger and negative
values). To create a hash called %words
(notice
the use of
%
, rather than
@
) with the keys and values given in Table 1.1, we
assign a value to %words
(much as we did earlier with the array):
%words = qw( fred camel barney gecko betty alpaca wilma alpaca );
Each pair of values in the list represents one key and its corresponding value in the hash. Note that we broke this assignment over many lines without any sort of line continuation character. We could do so because whitespace is generally insignificant in a Perl program.
To find the secret word for Betty, we need to use Betty as the key in
a reference to the hash %words
, via some
expression such as $words{"betty"}
. The value of
this reference is alpaca
, similar to what we had
before with the other array. Also, as before, the key can be any
expression, so setting $person
to
betty
and evaluating
$words{$person
} gives alpaca
as
well.
Putting all this together, we get a program like this:
%words = qw( fred camel barney gecko betty alpaca wilma alpaca ); print "What is your name? "; $name = <STDIN>; chomp ($name); if ($name eq "Erik") { print "Hello, Erik! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting $secretword = $words{$name}; # get the secret word print "What is the secret word? "; $guess = <STDIN>; chomp ($guess); while ($guess ne $secretword) { print "Wrong, try again. What is the secret word? "; $guess = <STDIN>; chomp ($guess); } }
Note the lookup of the secret word. If the name is not found, the
value of $secretword
will be an empty
string,[9] which we can then
check for if we want to define a default secret word for everyone
else. Here’s how that process looks:
[... rest of program deleted ...] $secretword = $words{$name}; # get the secret word if ($secretword eq "") { # oops, not found $secretword = "groucho"; # sure, why a duck? } print "What is the secret word? "; [... rest of program deleted ...]
If we enter Erik Olson
or erik
rather than Erik
, we’re lumped in with the
rest of the users, because the eq
comparison
requires an exact equality. Let’s look at one way to handle
that.
Suppose we wanted to look for any string that began with
Erik
, rather than just a string that was equal to
Erik
. We could do this with a
regular
expression: a template that defines a collection of strings that
match. The regular expression in Perl that matches any string that
begins with Erik
is ^Erik
. To
match this against the string in $name
, we use the
match operator as follows:
if ($name =~ /^Erik/) { ## yes, it matches } else { ## no, it doesn't }
Note that the regular expression is delimited by slashes. Within the slashes, spaces and other whitespace are significant, just as they are within strings.
This addition almost meets our needs, but it doesn’t handle
selecting erik
or rejecting
eriko
. To accept erik
, we add
the
ignore-case option, a
small i
appended after the closing slash. To
reject eriko
, we add a
word boundary special
marker in the form of \b
in the regular
expression. This ensures that the character following the first
k
in the regular expression is not another letter.
The addition also changes the regular expression to be
/^erik\b/i
, which means
"erik
at the beginning of the string, no
letter or digit following, and OK to be in either case.”
When this is added to the rest of the program, the final version looks like this:
%words = qw( fred camel barney gecko betty alpaca wilma alpaca ); print "What is your name? "; $name = <STDIN>; chomp ($name); if ($name =~ /^erik\b/i) { print "Hello, Erik! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting $secretword = $words{$name}; # get the secret word if ($secretword eq "") { # oops, not found $secretword = "groucho"; # sure, why a duck? } print "What is the secret word? "; $guess = <STDIN>; chomp ($guess); while ($guess ne $secretword) { print "Wrong, try again. What is the secret word? "; $guess = <STDIN>; chomp ($guess); } }
As you can see, the program is a far cry from the simple
Hello world
, but it’s still very small and
workable, and does quite a bit for being so short. This is The Perl
Way.
Perl provides nearly every regular expression feature imaginable. In addition, the way Perl handles string matching is about the fastest on the planet, so you don’t lose performance. String matching in Perl often compares favorably to hand-coded C programs written specifically for the same purpose.
So, now we can enter Erik
or
erik
or Erik
Olson
, but what about everyone else? Barney still
has to say exactly barney
(not even
barney
followed by a space).
To be fair to Barney, we need to grab the first word of whatever’s entered, and then convert it to lowercase before we look up the name in the table. We do this with two operators: the substitute operator, which finds a regular expression and replaces it with a string, and the translate operator, which puts the string in lowercase.
First, we discuss the substitute operator. We want to take the
contents of $name
, find the first nonword
character, and zap everything from there to the end of the string.
/\W.*/
is the regular expression we are looking
for—the \W
stands for a nonword character
(something besides a letter, digit, or underscore), and
.*
represents any characters from that point to
the end of the line. Now, to zap these characters, we need to take
whatever part of the string matches this regular expression and
replace it with nothing:
$name =~ s/\W.*//;
We’re using the same
=~
operator that we did
before, but now on the right we have a substitute operator: the
letter s
followed by a slash-delimited regular
expression and string. (The string in this example is the empty
string between the second and third slashes.) This operator looks and
acts very much like the substitution command of various editors.
Now, to get whatever’s left into lowercase, we translate the
string using the
tr
operator.[10] This operation takes a list of characters to find, and
another list of characters with which to replace them. For our
example, to put the contents of $name
in
lowercase, we use:
$name =~ tr/A-Z/a-z/;
The slashes delimit the searched-for and replacement character lists.
The dash between A
and Z
stands
for all the characters in between, so we have two lists that each
contain 26 characters. When the tr
operator finds
a character from the string in the first list, the character is then
replaced with the corresponding character in the second list. So, all
uppercase A’s become lowercase a’s, and so on.[11]
Putting everything together results in the following:
%words = qw( fred camel barney gecko betty alpaca wilma alpaca ); print "What is your name? "; $name = <STDIN>; chomp ($name); $original_name = $name; #save for greeting $name =~ s/\W.*//; # get rid of everything after first word $name =~ tr/A-Z/a-z/; # lowercase everything if ($name eq "erik") { # ok to compare this way now print "Hello, Erik! How good of you to be here!\n"; } else { print "Hello, $original_name!\n"; # ordinary greeting $secretword = $words{$name}; # get the secret word if ($secretword eq "") { # oops, not found $secretword = "groucho"; # sure, why a duck? } print "What is the secret word? "; $guess = <STDIN>; chomp ($guess); while ($guess ne $secretword) { print "Wrong, try again. What is the secret word? "; $guess = <STDIN>; chomp ($guess); } }
Notice how the regular expression match for Erik
became a simple comparison again. After all, both Erik Olson
and Erik
become
erik
after the substitution and translation. And
everyone else gets a fair ride, because Fred
and
Fred Flintstone
both become
fred
, Barney Rubble
and
Barney, the little guy
become
barney
, and so on.
With just a few statements, we’ve made the program much more user friendly. You’ll find that expressing complicated string manipulation with a few keystrokes is one of Perl’s many strong points.
However, hacking away at the name so that we could compare it and
look it up in the table destroyed the name that was entered. So,
before the program hacks on the name, it saves it in
$original_name
. (Like C symbols, Perl
variable names consist of letters,
digits, and underscores and can be of nearly unlimited length.) We
can then make references to $original_name
later.
Perl has many ways to monitor and mangle strings. You’ll find out about most of them in Chapter 7 and Chapter 15.
Now that we’ve added so much to the code, we have to scan through many detailed lines in order to get the overall flow of the program. What we need is to separate the high-level logic (asking for a name, looping based on entered secret words) from the details (comparing a secret word to a known good word). We might do this for clarity, or maybe because one person is writing the high-level part and another is writing (or has already written) the detailed parts.
Perl provides subroutines that have parameters and return values. A subroutine is defined once in a program, and can be invoked repeatedly from within any expression.
For our small-but-rapidly-growing program, let’s create a
subroutine called good_word
that takes a name and
a guessed word, and returns true if the word is
correct, and false if not. The definition of
such a subroutine looks like this:
sub good_word { my($somename,$someguess) = @_; # name the parameters $somename =~ s/\W.*//; # get rid of everything after first word $somename =~ tr/A-Z/a-z/; # lowercase everything if ($somename eq "erik") { # should not need to guess return 1; # return value is true } elsif (($words{$somename} || "groucho") eq $someguess) { return 1; # return value is true } else { return 0; # return value is false } }
First, the definition of a subroutine consists of the reserved word
sub
,
followed by the subroutine name, followed by a
block of code (delimited by curly
braces). These definitions can go anywhere in the
program file, but most people put them at the end.
The first line within this particular definition is an assignment
that copies the values of the two parameters of this subroutine into
two local variables named $somename
and
$someguess
. (The my()
defines
the two variables as private to the enclosing block—in this
case, the entire subroutine—and the parameters are initially in
a special local array called
@_
.)
The next two lines clean up the name, just like in the previous version of the program.
The if-elsif-else
statement decides whether the guessed word
($someguess
) is correct for the name
($somename
). Erik
should not
make it into this subroutine, but even if it does, whatever word was
guessed is OK.
A return
statement can be used to make the
subroutine immediately return to its caller with the supplied value.
In the absence of an explicit return
statement,
the last expression evaluated in a subroutine is the return value.
We’ll see how the return value is used after we finish
describing the subroutine definition.
The test for the elsif
part looks a little
complicated—let’s break it apart:
($words{$somename} || "groucho") eq $someguess
The first thing inside the parentheses is our familiar hash lookup,
yielding some value from %words
based on a key of
$somename
. The operator between that value and the
string groucho
is the
||
(logical-or) operator similar to that used in C. If the lookup from
the hash has a value (meaning that the key
$somename
was in the hash), the value of the
expression is that value. If the key could not be found, the string
groucho
is used instead. This step is a very
Perl-like thing to do—specify some expression, and then provide
a default value using
||
in case the expression turns out to be false.
In any case, whether it’s a value from the hash, or the default
value groucho
, we compare it to whatever was
guessed. If the comparison is true, we return 1; otherwise, we return
0.
So, expressed as a rule, if the name is erik
, or
the guess matches the lookup in %words
based on
the name (with a default of groucho
if not found),
then the subroutine returns 1; otherwise, it returns 0.
Now, let’s integrate all these additions with the rest of the program:
%words = qw( fred camel barney gecko betty alpaca wilma alpaca ); print "What is your name? "; $name = <STDIN>; chomp ($name); if ($name =~ /^erik\b/i) { # back to the other way :-) print "Hello, Erik! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting print "What is the secret word? "; $guess = <STDIN>; chomp ($guess); while (! good_word($name,$guess)) { print "Wrong, try again. What is the secret word? "; $guess = <STDIN>; chomp $guess; } } [... insert definition of good_word() here ...]
Notice that we’ve gone back to the regular expression to check for
Erik
, because now the main program does not have
to pull apart the first name and convert it to lowercase.
The big difference is the
while
loop containing
good_word
. Here, we see an invocation of the
subroutine passing two parameters, $name
and
$guess
. Within the subroutine, the value of
$somename
is set from the first parameter, in this
case $name
. Likewise,
$someguess
is set from the second parameter,
$guess
.
The value returned by the subroutine (either 1 or 0, recalling the
definition given earlier) is logically inverted with the prefix
!
(logical not)
operator. As in C, this operator returns true if the expression
following is false, and vice versa. The result of this negation
controls the while
loop. You can read this as
“while it’s not a good word...” Many well-written
Perl programs read very much like English, provided you take a few
liberties with either Perl or English. (But you certainly won’t
win a Pulitzer that way.)
Note that the subroutine assumes that the value of the
%words
hash is set by the main program.
Such a cavalier approach to global variables doesn’t scale very
well, of course. Generally speaking, variables not created with
my
are global to the whole program, while those
my
creates last only until the block in which they
were declared exits. Don’t worry; Perl does in fact support a
rich variety of other kinds of variables, including those private to
a file (or package), as well as variables private to a function that
retain their values between invocations (which is what we could
really use here). However, at this stage in your Perl education,
explaining these variables would only complicate your life. When
you’re ready for such information, check out what
Programming Perl has to say about scoping,
subroutines, modules, and objects. Or, see the online documentation
in the perlsub , perlmod ,
perlobj , and perltoot
documentation.
Suppose we wanted to share the secret word list among three programs.
If we store the word list as we have done already, we will need to
change all three programs when Betty decides that her secret word
should be swine
rather than
alpaca
. This change can get to be a hassle,
especially considering how often Betty is likely to change her mind.
So, let’s put the word list into a file, and then read the file
to get the word list into the program. To do so, we need to create an
I/O channel called a
filehandle.
Your Perl program automatically gets three filehandles called
STDIN
,
STDOUT
,
and
STDERR
,
corresponding to the three standard I/O channels in many programming
environments. We’ve already been using the
STDIN
handle to read data from the person running
the program. Now, we just have to get another handle attached to a
file of our own choice.
Here’s a small chunk of code to do that:
sub init_words { open (WORDSLIST, "wordslist"); while (defined ($name = <WORDSLIST>)) { chomp ($name); $word = <WORDSLIST>; chomp ($word); $words{$name} = $word; } close (WORDSLIST); }
We’re putting this code into a subroutine so that we can keep the main part of the program uncluttered. This organization also means that at a later time (hint: after a few more revisions in this stroll), we can change where the word list is stored, or even the format of the list.
The arbitrarily chosen format of the word list is one item per line, with names and words alternating. So, for our current database, we’d have something like this:
fred camel barney gecko betty alpaca wilma alpaca
The
open
function creates a
filehandle
named WORDSLIST
by associating it with a file
named wordslist
in the current directory. Note
that the filehandle doesn’t have a funny character in front of
it as do the three variable types. Also, filehandles are generally
uppercase—although they
aren’t required to be—for reasons detailed later.
The while
loop reads lines from the
wordslist
file (via the
WORDSLIST
filehandle) one line at a time. Each
line is stored into the $name
variable. When
end-of-file is reached, the value
returned by the <WORDSLIST>
operation is the
empty string,[12] which looks false to the
while
loop, and terminates it. That’s how we
get out at the end.
If you were running with the -w
option, you would have to check that the return value read in was
actually defined
. The empty string returned by the
<WORDSLIST>
operation isn’t merely
empty—it’s undef
again. The
defined
function is how you test for
undef
when this matters. In the case of reading
lines from a file, you’d test as shown:
while ( defined ($name = <WORDSLIST>) ) {
But if you were being that careful, you’d probably also have
checked to make sure that open
returned a true
value. You know, that’s probably not a bad idea either. The
built-in die
function is frequently used to exit
the program with an error message in case something goes wrong.
We’ll see an example of this function in the next revision of
the program.
On the other hand, the normal case is that we’ve read a line
(including the newline) into $name
. First, off
comes the newline using the
chomp
function. Then, we
have to read the next line to get the secret word, holding it in the
$word
variable. This variable also gets the
newline hacked off.
The final line of the while
loop puts
$word
into %words
with a key of
$name
, so that the rest of the program can access
it later.
After the file has been read, the filehandle can be recycled with the
close
function.
(Filehandles are automatically closed anyway when the program exits,
but we’re trying to be tidy. If we were
really tidy, we’d even check for a true
return value from close
in case the disk partition
which held the file went south, its network filesystem became
unreachable, or a similar catastrophe occurred. Yes, these things
really do happen. Murphy will always be with us.)
This subroutine definition can go after or before the other one. And,
we invoke the subroutine instead of setting %words
in the beginning of the program. Therefore, you could wrap up all of
this as follows:
init_words(); print "What is your name? "; $name = <STDIN>; chomp ($name); if ($name =~ /^erik\b/i) { # back to the other way :-) print "Hello, Erik! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting print "What is the secret word? "; $guess = <STDIN>; chomp ($guess); while (! good_word($name,$guess)) { print "Wrong, try again. What is the secret word? "; $guess = <STDIN>; chomp ($guess); } } ## subroutines from here down sub init_words { open (WORDSLIST,"wordslist") || die "can't open wordslist:$!"; while (defined ($name = <WORDSLIST>)) { chomp ($name); $word = <WORDSLIST>; chomp ($word); $words{$name} = $word; } close (WORDSLIST); } sub good_word { my($somename,$someguess) = @_; # name the parameters $somename =~ s/\W.*//; # delete everything after first word $somename =~ tr/A-Z/a-z/; # lowercase everything if ($somename eq "erik") { # should not need to guess return 1; # return value is true } elsif (($words{$somename} || "groucho") eq $someguess) { return 1; # return value is true } else { return 0; # return value is false } }
Now our program is starting to look full-grown. Notice the first
executable line is an invocation of init_words()
.
The return value is not used in a further calculation, which is good
because we didn’t return anything remarkable. In this case, a
true value is guaranteed (the value 1, in particular), because if the
close
had failed, the die
would
have printed a message to our STDERR error and exited the program.
The die
function is fully explained in Chapter 10, but because the return
values of anything that might fail must be checked, we’ll get
into the habit of using the function right from the start. The $!
variable (also explained in Chapter 10) contains
the system error message explaining why the system call failed.
The open
function is also used to open files for
output, or open programs as files (demonstrated shortly). The full
scoop on open
comes much later in this book,
however, in Chapter 10.
“That secret word list has got to change at least once a week!” cries the Chief Director of Secret Word Lists. Well, we can’t force the list to be different, but we can at least issue a warning if the secret word list has not been modified in more than a week.
The best place for handling this warning is the
init_words()
subroutine—we’re already
looking at the file there. The Perl operator
-M
returns the
age in
days since a file or filehandle has last been modified, so we just
need to see whether this value is greater than seven for the
WORDSLIST
filehandle:
sub init_words { open (WORDSLIST,"wordslist") || die "can't open wordlist:$!"; if (-M WORDSLIST > 7.0) { # comply with bureaucratic policy die "Sorry, the wordslist is older than seven days."; } while (defined ($name = <WORDSLIST>)) { chomp ($name); $word = <WORDSLIST>; chomp ($word); $words{$name} = $word; } close (WORDSLIST) || die "couldn't close wordlist: $!"; }
The value of -M WORDSLIST
is compared to seven,
and if the value is greater, bingo, we’ve violated policy.
The rest of the program remains unchanged, so in the interest of saving a few trees, we won’t repeat it here.
Besides getting the age of a file, we can also find out its size, access time, and everything else that an operating system maintains about a file. More information about this feature appears in Chapter 10.
We really ought to know when someone guesses incorrectly so that we
can watch for break-in attempts. If we were using a UNIX system, we
would probably use the mail command to send an
email message to someone about the failed attempt. However, on a
Windows workstation, no standard mail
[13] command exists, so we’re going to log failures to a
file.[14] We
need only do a little work to accomplish this task. We’ll add a
new subroutine and modify only the good_word()
subroutine (thanks to modularity), because we have all the
information we need:
sub good_word { my($somename,$someguess) = @_; # name the parameters $somename =~ s/\W.*//; # get rid of stuff after first word $somename =~ tr/A-Z/a-z/; # lowercase everything if ($somename eq "erik") { # should not need to guess return 1; # return value is true } elsif (($words{$somename}||"groucho") eq $someguess) { return 1; # return value is true } else { log_failure($somename, $someguess); return 0; # return value is false } } sub log_failure { my($somename,$someguess) = @_; # name the parameters open(LOG, ">>failures.log") || die "failures.log: $!"; print LOG "bad news: $somename guessed $someguess\n"; close (LOG) || die "can't close failures.log: $!"; }
Notice the
open
, which has a redirection
symbol (>>
) in the filename. This symbol is
a special indication that we are appending to a file. The next
statement, a print
, shows that a
filehandle between the
print
keyword and the
values to be printed selects that filehandle for output, rather than
STDOUT
.[15] This means that the
message will be written to the output file that we’ve opened.
Finally, we close the filehandle.
Let’s change the definition of the secret word filename
slightly. Instead of just the file named
wordslist
, let’s look for anything in the
current directory that ends in .sec
. At the
command prompt, we say:
> dir /B *.sec
to get a brief listing of all of these names. As you’ll see in a moment, Perl uses a similar wildcard name syntax.
Pulling out the init_words()
definition again:
sub init_words { while (defined ($filename = glob("*.sec")) ) { open (WORDSLIST, $filename) || die "can't open $filename:$!"; if (-M WORDSLIST <= 7.0) { while (defined ($name = <WORDSLIST>)) { chomp ($name); $word = <WORDSLIST>; chomp ($word); $words{$name} = $word; } } close (WORDSLIST) || die "couldn't close $filename: $!"; } }
First, I’ve wrapped a new
while
loop around the
bulk of the routine from the previous version. The new thing here is
the glob
function. This is called a
filename
glob[16]
for historical
reasons. The function works much like
<STDIN>
, in that each time it is accessed,
it returns the next value: successive filenames that match the
pattern, in this case *.sec
. When there are no
additional filenames to be returned, the filename glob returns an
empty string.[17] In Perl on Windows systems, filename
globbing[18] is implemented by means of another program, called
PerlGlob.exe, which must be somewhere in your
search path (this should usually be the case, because
PerlGlob.exe is installed in the same directory
as your Perl interpreter by default).
So, if the current directory contains fred.sec
and
barney.sec
, then $filename
is
barney.sec
on the first pass through the
while
loop (the names come out in alphabetically
sorted order). On the second pass, $filename
is
fred.sec
. And there is no third pass because the
glob returns an empty string the third time it is called, perceived
by the while
loop to be a false, causing an exit
from the subroutine.
Within the while
loop, we open the file and verify
that it is recent enough (less than seven days since the last
modification). For the recent-enough files, we scan through as
before.
Note that if there are no files that match *.sec
and are less than seven days old, the subroutine will exit without
having set any secret words into the %words
array.
In such a case, everyone must use the word
groucho
. Oh well. (For real
code, we would have added some check on the number of entries in
%words
before returning, and
die
‘d
if the check wasn’t good. See the
keys
function when we get to hashes in Chapter 5.)
Well, the Chief Director of Secret Word Lists wants a report of all the secret words currently in use, and how old they are. If we set aside the secret word program for a moment, we’ll have time to write a reporting program for the Director.
First, let’s get all of the secret words, by stealing some code
from the init_words()
subroutine:
while ( defined ($filename = glob("*.sec")) ) { open (WORDSLIST, $filename) || die "can't open $filename: $!"; if (-M WORDSLIST <= 7.0) { while (defined ($name = <WORDSLIST>) { chomp ($name); $word = <WORDSLIST>; chomp ($word); ### new stuff will go here } } close (WORDSLIST) || die "couldn't close $filename: $!"; }
At the point marked “new stuff will go here,” we know
three things: the name of the file ($filename
),
someone’s name ($name
), and that
person’s secret word ($word
). Here’s a
place to use Perl’s report generating tools. We define a
format somewhere in the program (usually
near the end, like a subroutine):
format STDOUT = @<<<<<<<<<<<<<<< @<<<<<<<<< @<<<<<<<<<<< $filename, $name, $word .
The format definition begins with format STDOUT =
,
and ends with a single period. The other two lines comprise the
format itself. The first line of this format is a
field definition
line that specifies the number, length, and type of the
fields. For this format, we have three fields. The line following a
field definition line is always a
field value
line. The value line gives a list of expressions that will
be evaluated when this format is used, and the results of those
expressions will be plugged into the fields defined in the previous
line.
We invoke this format with the
write
function, as shown:
while ( defined($filename = glob("*.sec")) ) { open (WORDSLIST, $filename) || die "can't open $filename: $!"; if (-M WORDSLIST <= 7.0) { while (defined ($name = <WORDSLIST>) { chomp ($name); $word = <WORDSLIST>; chomp ($word); write; # invoke format STDOUT to STDOUT } } close (WORDSLIST) || die "couldn't close $filename: $!"; } format STDOUT = @<<<<<<<<<<<<<<< @<<<<<<<<< @<<<<<<<<<<< $filename, $name, $word .
When the format is invoked, Perl evaluates the field expressions and
generates a line that it sends to the STDOUT
filehandle. Because write
is invoked once each
time through the loop, we’ll get a series of lines with text in
columns, one line for each secret word entry.
Hmm. We haven’t labeled the columns. That’s easy enough. We just need to add a top-of-page format, as shown:
format STDOUT_TOP = Page @<< $% Filename Name Word ================ ========== ============ .
This format is named STDOUT_TOP
, and will be used
initially at the first invocation of the STDOUT
format, and again every time 60 lines of output to
STDOUT
have been generated. These column headings
line up with the columns from the STDOUT
format,
so everything comes out tidy.
The first line of this format shows some constant text
(Page
) along with a three-character field
definition. The following line is a field value line, which in this
case has one expression. This expression is the
$%
variable,[19] which holds the
number of pages printed—a very useful value in top-of-page
formats.
The third line of the format is blank. Because this line does not contain any fields, the line following it is not a field value line. This blank line is copied directly to the output, creating a blank line between the page number and the column headers below.
The last two lines of the format also contain no fields, so they are copied as-is, directly to the output. So this format generates four lines, one of which has a part that changes from page to page.
Just tack this definition onto the previous program to get it to work. Perl notices the top-of-page format automatically.
Perl also has fields that are centered or right justified, and supports a filled paragraph area as well. More on these features when we get to formats in Chapter 11.
As we are scanning through the *.sec
files in the
current directory, we may find files that are too old. So far, we are
simply skipping over those files. Let’s go one step
more—we’ll
rename
them to *.sec.old
so that a directory listing will
quickly show us which files are too old, simply by name.
Here’s how the init_words()
subroutine looks
with this modification:
sub init_words { while ( defined($filename = glob("*.sec")) ) { open (WORDSLIST, $filename) || die "can't open $filename: $!"; if (-M WORDSLIST <= 7.0) { while (defined ($name = <WORDSLIST>)) { chomp ($name); $word = <WORDSLIST>; chomp ($word); $words{$name} = $word; } close (WORDSLIST) || die "can't close $filename: $!"; } else { # must close file before renaming it close (WORDSLIST) || die "can't close $filename.old: $!" rename ($filename,"$filename.old") || die "can't rename $filename: $!"; } } }
Notice the new else
part of the file age check. If
the file is older than seven days, it gets renamed with the
rename
function. This function takes two parameters, renaming the file named
by the first parameter to the name given in the second parameter.
Perl has a complete range of file manipulation operators—nearly anything you can do to a file from a C program, you can also do from Perl.
Let’s keep track of when the most recent correct guess has been made for each user. One data structure that might seem to work at first glance is a hash. For example, the statement:
$last_good{$name} = time;
assigns the current time in internal format (some large integer above
800 million, incrementing one number per second) to an element of
%last_good
that has the name for a key. Over time,
this method would seem to give us a database indicating the most
recent time the secret word was guessed properly for each of the
users who had invoked the program.
But, the hash doesn’t have an existence between invocations of the program. Each time the program is invoked, a new hash is formed, so at most, we create a one-element hash and then immediately forget it when the program exits.
The
dbmopen
function[20] maps a
hash out into a disk file (actually a pair of disk files) known as a
DBM. It’s
used like this:
dbmopen (%last_good,"lastdb",0666) || die "can't dbmopen lastdb: $!"; $last_good{$name} = time; dbmclose (%last_good) || die "can't dbmclose lastdb: $!";
The first statement performs the mapping, using the disk filenames of
lastdb.dir
and lastdb.pag
(these names are the normal names for a DBM called
lastdb
). Showing Perl’s UNIX heritage,
dbmopen
takes an octal
file permission mask as the third argument. Although all Windows
filesystems support file attributes of some sort, they are largely
incompatible with the UNIX filesystem scheme used by Perl. The UNIX
file permissions used for these two files, if the files must be
created (as they will the first time through), is
0666
. This mode means that anyone can read or
write the files. This mode is usually the one that you want to use
when working with files with read/write attributes.[21]
The Perl file
permission value
is composed of a series of bits with read, write, and execute
privileges for the user, the user’s group, and everyone else.
Traditionally, FAT filesystems only keep track of read and write
privileges for the user, along with a few other tidbits of
information, like whether the file is a hidden or a system file.
We’ll discuss file permissions and attributes in detail in
Chapter 13. For now, just trust us that you want to
use 0666
for creating DBM files.
The second statement shows that we use this mapped hash just like a normal hash. However, creating or updating an element of the hash automatically updates the disk files that form the DBM. And, when the hash is later accessed, the values within the hash come directly from the disk image. This gives the hash a life beyond the current invocation of the program—a persistence of its own.
The third statement disconnects the hash from the DBM, much like a
file close
operation.
You can insert these three statements just ahead of the subroutine definitions.
Although the inserted statements maintain the database adequately (and even create the database initially), we don’t have any way of examining the information yet. To do so, we can create a separate little program that looks something like this:
dbmopen (%last_good,"lastdb",0666) || die "can't dbmopen lastdb: $!"; foreach $name (sort keys %last_good) { $when = $last_good{$name}; $hours = (time - $when) / 3600; # compute hours ago write; } format STDOUT = User @<<<<<<<<<<<: last correct guess was @<<< hours ago. $name, $hours .
We’ve got a few new operations here: a
foreach
loop, sorting a list, and getting the keys of an hash.
First, the
keys
function takes a hash name as an argument and returns a list of all
the keys of that hash in some unspecified order. For the
%words
hash defined earlier, the result is
something like fred
, barney
,
betty
, wilma
, in some
unspecified order. For the %last_good
hash, the
result will be a list of all users who have guessed their own secret
word successfully.
The
sort
function sorts the list alphabetically (just like passing a text file
through the sort command). This function makes
sure that the list processed by the foreach
statement is always in alphabetical order.
The Perl
foreach
statement takes a list of values and assigns each one in turn to a
scalar variable (here, $name
), executing the body
of the loop (a block) once for each value. So, for five names in the
%last_good
list, we get five passes through the
loop, with $name
being a different value each
time.
The body of the foreach
loop loads up a couple of
variables used within the STDOUT
format, and then
invokes the format. Note that we figure out the age of the entry by
subtracting the stored system
time (in the array) from the current time
(as returned by time
), and then divide that by
3600 (to convert seconds to hours).
Perl also provides easy ways to create and maintain text-oriented databases and fixed-length-record databases. These databases are described in Chapter 17.
Here are the programs from this stroll in their final form so that you can play with them.
First, the “say hello” program:
init_words(); print "What is your name? "; $name = <STDIN>; chomp ($name); if ($name =~ /^erik\b/i) { # back to the other way :-) print "Hello, Erik! How good of you to be here!\n"; } else { print "Hello, $name!\n"; # ordinary greeting print "What is the secret word? "; $guess = <STDIN>; chomp ($guess); while (! good_word($name,$guess)) { print "Wrong, try again. What is the secret word? "; $guess = <STDIN>; chomp ($guess); } } dbmopen (%last_good,"lastdb",0666) || die "can't dbmopen lastdb: $!"; $last_good{$name} = time; dbmclose (%last_good) || die "can't dbmclose lastdb: $!"; sub init_words { while ( defined($filename = glob("*.sec")) ) { open (WORDSLIST, $filename) || die "can't open $filename: $!"; if (-M WORDSLIST <= 7.0) { while (defined ($name = <WORDSLIST>)) { chomp ($name); $word = <WORDSLIST>; chomp ($word); $words{$name} = $word; } close (WORDSLIST) || die "can't close $filename: $!"; } else { # must close file before renaming it close (WORDSLIST) || die "can't close $filename.old: $!" rename ($filename,"$filename.old") || die "can't rename $filename: $!"; } } } sub good_word { my($somename,$someguess) = @_; # name the parameters $somename =~ s/\W.*//; # delete everything after first word $somename =~ tr/A-Z/a-z/; # lowercase everything if ($somename eq "erik") { # should not need to guess return 1; # return value is true } elsif (($words{$somename} || "groucho") eq $someguess) { return 1; # return value is true } else { log_failure($somename,$someguess); return 0; # return value is false } } sub log_failure { my($somename,$someguess) = @_; # name the parameters open(LOG, ">>failures.log") || die "failures.log: $!"; print LOG "bad news: $somename guessed $someguess\n"; close (LOG) || die "can't close failures.log: $!"; }
Next, we have the secret word lister:
while ( defined($filename = glob("*.sec")) ) { open (WORDSLIST, $filename) || die "can't open $filename: $!"; if (-M WORDSLIST <= 7.0) { while (defined ($name = <WORDSLIST>)) { chomp ($name); $word = <WORDSLIST>; chomp ($word); write; # invoke format STDOUT to STDOUT } } close (WORDSLIST) || die "can't close $filename: $!"; } format STDOUT = @<<<<<<<<<<<<<<< @<<<<<<<<< @<<<<<<<<<<< $filename, $name, $word . format STDOUT_TOP = Page @<< $% Filename Name Word ================ ========== ============ .
And finally, the last-time-a-word-was-used display program:
dbmopen (%last_good,"lastdb",0666) || die "can't dbmopen lastdb: $!"; foreach $name (sort keys %last_good) { $when = $last_good{$name}; $hours = (time - $when) / 3600; # compute hours ago write; } dbmclose(%last_good) || die "can't dbmclose lastdb: $!"; format STDOUT = User @<<<<<<<<<<<: last correct guess was @<<< hours ago. $name, $hours .
Together with the secret word lists (files named
something
.sec
in the
current directory) and the database, lastdb.dir
and lastdb.pag
, you’ll have all you need.
[7] The
semicolon can be omitted when the statement is the last statement of
a block, file, or eval
.
[8] Well, OK, there’s a standard C
library
function. But that’s not an
operator.
[9] Well, OK, the value is really the
undef
value, but it looks like an empty string to
the eq
operator. You’d get a warning about
this value if you used -w
on the command line,
which is why we omitted it here.
[10] This method doesn’t work for characters with accent
marks, although the uc
function would.
[11]
Experts will note that we could have
also constructed something like s/(\S*).*/\L$1/
to
do this processing in one fell swoop, but experts probably
won’t be reading this section.
[12] Well, technically the value is
undef
again, but empty string is close enough for
this discussion.
[13] Perl for Win32 programmers will encounter this mail comand issue frequently in scripts that they find on the Net. The solution is to use one of a number of readily available command-line mailers, or to use Perl’s network interface to talk to an SMTP server directly.
[14] We could also use the Win32::EventLog module to log our warnings to the Windows NT Event Log.
[15] Well, technically, the currently selected filehandle. That’s covered much later, though.
[16] Glob might be a new word to Win32 programmers. We’ll talk much more about globbing in Chapter 12.
[17] Yeah, yeah, undef
again.
[18] If you’re using the ISAPI version of Perl, you’ll have better luck if you avoid file globbing altogether and use the following equivalent technique:
[19] More mnemonic aliases for these predefined scalar variables are available via the English module, which provides English names for Perl’s special variables.
[20] On a specific database, use the more
low-level tie
function, as detailed in Chapters 5
and 7 of Programming Perl, or in the
perltie documentation.
[21] The Win32::File module provides additional features for setting file attributes. We’ll discuss those features in more detail in Chapter 13.
Get Learning Perl on Win32 Systems now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.