Cover | Table of Contents | Colophon
int a, *p; p = &a; /* p now has the "address" of a */
malloc(3) to allocate a piece of memory at
run-time and obtain its address:p = malloc(sizeof(int));
# Create some variables
$a = "mama mia";
@array = (10, 20);
%hash = ("laurel" => "hardy", "nick" => "nora");
# Now create references to them
$ra = \$a; # $ra now "refers" to (points to) $a
$rarray = \@array;
$rhash = \%hash;
$ra = \10; $rs = \"hello world";
$r_array_element = \$array[1]; # Refers to the scalar $array[1]
$r_hash_element = \$hash{"laurel"}; # Refers to the scalar
# $hash{"laurel"}
$ra or
$rarray, is an ordinary scalar—hence the
prefix `$'. A scalar, in other words,
can be a number, a string, or a reference and can be freely
reassigned to one or the other of these (sub)types. If you print a
scalar while it is a reference, you get something like this:@_ array available within the subroutine. The only
way to avoid this merger is to pass references to the input arrays or
hashes. Here's an example that adds elements of one array to
the corresponding elements of the other:@array1 = (1, 2, 3); @array2 = (4, 5, 6, 7);
AddArrays (\@array1, \@array2); # Passing the arrays by reference.
print "@array1 \n";
sub AddArrays
{
my ($rarray1, $rarray2) = @_;
$len2 = @$rarray2; # Length of array2
for ($i = 0 ; $i < $len2 ; $i++) {
$rarray1->[$i] += $rarray2->[$i];
}
}
AddArrays which then dereferences the two
references, determines the lengths of the arrays, and adds up the
individual array elements.
while ($ref_line = GetNextLine()) {
.....
.....
}
sub GetNextLine () {
my $line = <F> ;
exit(0) unless defined($line);
.....
return \$line; # Return by reference, to avoid copying
}
GetNextLine returns the line by reference to avoid
copying.%sue = ( # Parent
'name' => 'Sue',
'age' => '45');
%john = ( # Child
'name' => 'John',
'age' => '20');
%peggy = ( # Child
'name' => 'Peggy',
'age' => '16');
@children = (\%john, \%peggy);
$sue{'children'} = \@children;
# Or
$sue{'children'} = [\%john, \%peggy];
%sue:print $sue{children}->[1]->{age};
$sue{children}->[1]->{age} = 10;
%sue, gives it
a hash element indexed by the string children,
points that entry to a newly allocated array, whose second element is
made to refer to a freshly allocated hash, which gets an entry
indexed by the string age. Talk about programmer
efficiency.> if (and only if) it is between
subscripts. That is, the following expressions are equivalent:ref
function queries a scalar to see
whether it contains a reference and, if so, what type of data it is
pointing to. ref returns false (a Boolean value,
not a string) if its argument contains a number or a string; and if
it's a reference, ref returns one of these
strings to describe the data being referred to: "SCALAR",
"HASH", "ARRAY", "REF" (referring
to another reference variable), "GLOB" (referring to a
typeglob), "CODE" (referring to a subroutine), or
"package name" (an object belonging
to this package—we'll see more of it later).$a = 10; $ra = \$a;
ref($a) yields FALSE, since $a
is not a reference.ref($ra) returns the string "SCALAR",
since $ra is pointing to a scalar
value.
$$var indicates that
$var is a reference variable, and the programmer
expects this expression to return the value that was pointed to by
$var when the references were taken.$var is not a reference variable at all?
Instead of complaining loudly, Perl checks to see whether
$var contains a string. If so, it uses that string
as a regular variable name and messes around with this variable!
Consider the following:$x = 10;
$var = "x";
$$var = 30; # Modifies $x to 30 , because $var is a symbolic
# reference !
$$var, Perl first checks to see
whether $var is a reference, which it is not;
it's a string. Perl then decides to give the expression one
more chance: it treats $var's contents as a
variable identifier ($x). The example hence ends
up modifying $x to 30.my.$var = "x"; @$var = (1, 2, 3); # Sets @x to the enumerated list on the right
$var dictates the
type of variable to access: $$var is equivalent to
$x, and
@
$var is equivalent to
saying
@
x.eval. Let us say you want
your script to process a command-line option such as
"-Ddebug_level=3" and set the
$debug_level variable. This is one way of doing
it:while ($arg = shift @ARGV){
if ($arg =~ /-D(\w+)=(\w+)/) {
$var_name = $1; $value = $2;
$$var_name = $value; # Or more compactly, $$1 = $2;
}
}
ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps
$'s and
@'s without batting an eyelid. For each
problem, we will examine different ways of representing the same data
and study the trade-offs in program versus programmer efficiency. In
the interest of clarity, we will not worry too much about error
handling.
foo structures from
creating a hundred copies of the strings a and
str.@foo instead, which is slightly more
efficient, yet a tad more cumbersome:$a = 0; $str = 1; # Indices $foo[$a] = 10; # Equivalent to foo.a = 10 in C. $foo[$str] = "hello"; # equivalent to foo.str = "hello" in C.
pack or
sprintf to encode a set of values to get one
composite entity, but accessing individual data elements is neither
convenient nor efficient (in time). pack is a good
option when you need to be frugal about space, because it converts a
list of values into one scalar value without necessarily changing
each individual item's machine representation;
MAT1 1 2 4 10 30 0 MAT2 5 6 1 10
@MAT1 and @MAT2).@matrix = (
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
);
# Change 6, the element at row 1, column 2 to 100
$matrix[1][2] = 100;
@matrix is a simple array whose elements
happen to be references to anonymous arrays. Further, recall that
$matrix[1][2] is a simpler way of saying
$matrix[1]->[2].push statement (highlighted); it uses the symbolic
reference facility to create variables
(@{$matrix_name}) and appends a reference to a new
row in every iteration. We are assured of newly allocated rows in
every iteration because @row is local to that
block, and when the if statement is done, its
contents live on because we squirrel away a reference to the
array's value. (Recall that it is the value that is reference
counted, not the name.)
sub matrix_read_file {
my ($filename) = @_;
open (F, $filename) || die "Could not open $filename: $!";
while ($line = <F>) {
chomp($line);
next if $line =~ /^\s*$/; # skip blank lines
if ($line =~ /^([A-Za-z]\w*)/) {
$matrix_name = $1;
} else {
#file: professor.dat
id : 42343 #Employee Id
Name : E.F.Schumacher
Office Hours: Mon 3-4, Wed 8-9
Courses : HS201, SS343 #Course taught
...
#file: student.dat
id : 52003 # Registration id
Name : Garibaldi
Courses : H301, H302, M201 # Courses taken
...
#file: courses.dat
id : HS201
Description : Small is beautiful
Class Hours : Mon 2-4, Wed 9-10, Thu 4-5
...
$student{42343} = {
'Name' => 'E.F.Schumacher',
'Courses' => [ ]};
1995:Actor:Nicholas Cage 1995:Picture:Braveheart 1995:Supporting Actor:Kevin Spacey 1994:Actor:Tom Hanks 1994:Picture:Forrest Gump 1928:Picture:WINGS
%year_index and
%category_index map the year and category to
anonymous arrays containing references to the entries. Here is one
way to build this structure:open (F, "oscar.txt") || die "Could not open database: $!";
%category_index = (); %year_index = ();
while ($line = <F>) {
chomp $line;
($year, $category, $name) = split (/:/, $line);
create_entry($year, $category, $name) if $name;
}
sub create_entry { # create_entry (year, category, name)
my($year, $category, $name) = @_;
# Create an anonymous array for each entry
$rlEntry = [$year, $category, $name];
# Add this to the two indices
push (@{$year_index {$year}}, $rlEntry); # By Year
push (@{$category_index{$category}}, $rlEntry); # By Category
}dumpValue in a file called
dumpvar.pl, which can be found in the standard
library directory. We can help ourselves to it, with the caveat that
it is an unadvertised function and could change someday. To
pretty-print this structure, for example:@sample = (11.233,{3 => 4, "hello" => [6,7]});
require 'dumpvar.pl'; dumpValue(\@sample); # always pass by reference
0 11.233
1 HASH(0xb75dc0)
3 => 4
'hello' => ARRAY(0xc70858)
0 6
1 7
require statement in Chapter 6. Meanwhile, just think of it as a fancy
#include (which doesn't load the file if it
is already loaded).
pretty_print(@sample); # Doesn't need a reference
11.233
{ # HASH(0xb78b00)
: 3 => 4
: hello =>
: : [ # ARRAY(0xc70858)
: : : 6
: : : 7
: : ]
}
print_array, print_hash, or
print_scalar) that know how to print specific data
types. print_ref, charged with the task of
pretty-printing a reference, simply dispatches control to one of the
above procedures depending upon the type of argument given to it. In
turn, these procedures may call http://language.perl.com/info/documentation.html.local versus
my). There are a couple of useful idioms that
arise from these differences.my). In this section we briefly study
how these two are represented internally. Let us start with global
variables.$spud, the array
@spud, the hash %spud, the
subroutine &spud, the filehandle
spud, and the format name spud
are all simultaneously valid and completely independent of each
other. In other words, Perl provides distinct namespaces for each
type of entity. I do not have an explanation for why this feature is
present. In fact, I consider it a rather dubious facility and
recommend that you use a distinct name for each logical entity in
your program; you owe it to the poor fellow who's going to
maintain your code (which might be you!).my). In this section we briefly study
how these two are represented internally. Let us start with global
variables.$spud, the array
@spud, the hash %spud, the
subroutine &spud, the filehandle
spud, and the format name spud
are all simultaneously valid and completely independent of each
other. In other words, Perl provides distinct namespaces for each
type of entity. I do not have an explanation for why this feature is
present. In fact, I consider it a rather dubious facility and
recommend that you use a distinct name for each logical entity in
your program; you owe it to the poor fellow who's going to
maintain your code (which might be you!).
*"; while you can think of
it as a wildcard representing all values sharing the identifier name,
there's no pattern matching going on. You can assign typeglobs,
store them in arrays, create local versions of them, or print them
out, just as you can for any fundamental type. More on this in a
moment.
local
only) and assigned to one another. Assigning a typeglob has the
effect of aliasing one identifier name to another. Consider$spud = "Wow!";
@spud = ("idaho", "russet");
*potato= *spud; # Alias potato to spud using typeglob assignment
print "$potato\n"; # prints "Wow!"
print @potato, "\n"; # prints "idaho russet"
$spud and $potato are the
same thing, and so are the subroutines &spud
and &potato. Figure 3.2
shows the picture after a typeglob assignment; both entries in the
symbol table end up pointing to the same typeglob value.
spud, but if we define it
after the typeglobs have been assigned, that
subroutine can also be invoked as potato. It turns
out that the alias works the other way too. If you assign a new list
to @potato, it will also be automatically
accessible as @spud.local, because it restores the
typeglob's values at the end of the block.$b = 10;
{
local *b; # Save *b's values
*b = *a; # Alias b to a
$b = 20; # Same as modifying $a instead
} # *b restored at end of block
print $a; # prints "20"
print $b; # prints "10"
local
$a can be
seen simply as a dereference of a typeglob ${*a}.
For this reason, Perl makes the two expressions
${\$a} and ${*a} refer to the
same scalar value. This equivalence of typeglobs and ordinary
references has some interesting properties and results in three
useful idioms, described here.*b
=
*a makes everything named
"a" be referred to as
"b" also. There is a way to create
selective aliases, using the reference
syntax:*b = \$a; # Assigning a scalar reference to a typeglob
$b and
$a are aliases, but @b and
@a (or &b and
&a, and so on) are not.*PI = \3.1415927; # Now try to modify it. $PI = 10;
&$rs()), you can assign a name to
it for convenience:sub generate_greeting {
my ($greeting) = @_;
sub { print "$greeting world\n";}
}
$rs = generate_greeting("hello");
# Instead of invoking it as &$rs(), give it your own name.
*greet = $rs;
greet(); # Equivalent to calling &$rs(). Prints "hello world\n"
open and opendir initialize a
filehandle and a directory handle, respectively:open(F, "/home/calvin"); opendir (D, "/usr");
F and D are
user-defined identifiers, but without a prefix symbol. Unfortunately,
these handles don't have some basic facilities enjoyed by the
important data types such as scalars, arrays, and hashes—you
cannot assign handles, and you cannot create local handles:
local (G); # invalid G = F; # also invalid
G = F; # or, local(F);
*G = *F; # or, local (*F);
qsort) or returns new procedures. The
latter feature is available only in interpreted languages such as
Perl, Python, and LISP (hey, LISPers, you have lambda functions!).\&mysub is a reference to
&mysub. For example:sub greet {
print "hello \n";
}
$rs = \&greet; # Create a reference to subroutine greet
greet subroutine here, in the same way that we
don't evaluate the value of a scalar when we take a reference
to it.\&mysub is a reference to
&mysub. For example:sub greet {
print "hello \n";
}
$rs = \&greet; # Create a reference to subroutine greet
greet subroutine here, in the same way that we
don't evaluate the value of a scalar when we take a reference
to it.$rs = \&greet();
greet and produces a reference to its
return value, which is the value of the last
expression evaluated inside that subroutine. Since
print executed last and returned a 1 or a
(indicating whether or not it was successful in printing the value),
the result of this expression is a reference to a scalar containing 1
or 0! These are the kind of mistakes that make you wish for
type-safety once in a while!$rs = sub {
print "hello \n";
};
$rs%options as a dispatch table that maps a set
of command-line options to different subroutines:%options = ( # For each option, call appropriate subroutine.
"-h" => \&help,
"-f" => sub {$askNoQuestions = 1},
"-r" => sub {$recursive = 1},
"_default_" => \&default,
);
ProcessArgs (\@ARGV, \%options); # Pass both as references
ProcessArgs can now
be written in a very generic way. It takes two arguments: a reference
to an array that it parses and a mapping of options that it refers to
while processing the array. For each option, it calls the appropriate
"mapped" function, and if an invalid flag is supplied in
@ARGV, it calls the function corresponding to the
string _default_.
ProcessArgs is shown in Example 4.1.ProcessArgs (\@ARGV, \%options); # Pass both as references
sub ProcessArgs {
# Notice the notation: rl = ref. to array, rh = ref. to hash
my ($rlArgs, $rhOptions) = @_;
foreach $arg (@$rlArgs) {
if (exists $rhOptions->{$arg}) {
# The value must be a reference to a subroutine
$rsub = $rhOptions->{$arg};
&$rsub(); # Call it.
} else { #option does not exist.
if (exists $rhOptions->{"_default_"}) {
&{$rhOptions{"_default_"}};
}
}
}
}my) variables. Consider$greeting = "hello world";
$rs = sub {
print $greeting;
};
&$rs(); #prints "hello world"
$greeting. No surprises here, right? Now,
let's modify this innocuous example slightly:sub generate_greeting {
my($greeting) = "hello world";
return sub {print $greeting};
}
$rs = generate_greeting();
&$rs(); # Prints "hello world"
generate_greeting subroutine returns the
reference to an anonymous subroutine, which in turn prints
$greeting. The curious thing is that
$greeting is a my variable that
belongs to generate_greeting. Once
generate_greeting finishes executing, you would
expect all its local variables to be destroyed. But when you invoke
the anonymous subroutine later on, using
&$rs(), it manages to still print
$greeting. How does it work?$greeting right away. A subroutine
block, on the other hand, is a package of code to be invoked at a
later time, so it keeps track of all the
variables it is going to need later on (taking them "to
go," in a manner of speaking). When this subroutine is called
subsequently and invokes print
"$greeting", the subroutine remembers the value
that $greeting had when that subroutine was
created.sub generate_greeting {
my($greeting) = @_; # $greeting primed by arguments
return sub {
my($subject)= @_;
print "$greeting $subject \n";
};
}
$rs1 = generate_greeting("hello");
$rs2 = generate_greeting("my fair");
# $rs1 and $rs2 are two subroutines holding on to different $greeting's
&$rs1 ("world") ; # prints "hello world"
&$rs2 ("lady") ; # prints "my fair lady"CreateButton creates a GUI button and feeds it a
reference to an anonymous subroutine reference
($callback_proc), which holds on to
$title, a my variable in its
enclosing environment. When the user clicks on the button, the
callback is invoked, whereupon it uses its stored value of
$title.use Tk;
# Creates a top level window
$topwindow = MainWindow->new();
# Create two buttons. The buttons print their names when clicked on.
CreateButton($topwindow, "hello");
CreateButton($topwindow, "world");
Tk::MainLoop(); # Dispatch events.
#--------------------------------------------------------------------
sub CreateButton {
my ($parent, eval) to
pass around bits and pieces of code. While you can do this in Perl
also, Perl's anonymous subroutines are packets of precompiled
code, which definitely work faster than dynamic evaluation. Perl
closures give you other advantages that are not available in Tcl: the
ability to share private variables between different closures (in
Tcl, they have to be global variables for them
to be sharable) and not worry about variable interpolation rules (in
Tcl, you have to take care to completely expand all the variables
yourself using interpolation before you pass a piece of code along to
somebody else).callback_object->execute().