BUY THIS BOOK
Add to Cart

Print Book $49.95


Add to Cart

Print+PDF $64.94

Add to Cart

PDF $39.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £35.50

What is this?

Looking to Reprint or License this content?


Programming Perl
Programming Perl, Third Edition By Larry Wall, Tom Christiansen, Jon Orwant
July 2000
Pages: 1104

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: An Overview of Perl
We think that Perl is an easy language to learn and use, and we hope to convince you that we're right. One thing that's easy about Perl is that you don't have to say much before you say what you want to say. In many programming languages, you have to declare the types, variables, and subroutines you are going to use before you can write the first statement of executable code. And for complex problems demanding complex data structures, declarations are a good idea. But for many simple, everyday problems, you'd like a programming language in which you can simply say:
print "Howdy, world!\n";
and expect the program to do just that.
Perl is such a language. In fact, this example is a complete program, and if you feed it to the Perl interpreter, it will print "Howdy, world!" on your screen. (The \n in the example produces a newline at the end of the output.)
And that's that. You don't have to say much after you say what you want to say, either. Unlike many languages, Perl thinks that falling off the end of your program is just a normal way to exit the program. You certainly may call the exit function explicitly if you wish, just as you may declare some of your variables, or even force yourself to declare all your variables. But it's your choice. With Perl you're free to do The Right Thing, however you care to define it.
There are many other reasons why Perl is easy to use, but it would be pointless to list them all here, because that's what the rest of the book is for. The devil may be in the details, as they say, but Perl tries to help you out down there in the hot place too. At every level, Perl is about helping you get from here to there with minimum fuss and maximum enjoyment. That's why so many Perl programmers go around with a silly grin on their face.
This chapter is an overview of Perl, so we're not trying to present Perl to the rational side of your brain. Nor are we trying to be complete, or logical. That's what the following chapters are for. Vulcans, androids, and like-minded humans should skip this overview and go straight to Chapter 2, for maximum information density. If, on the other hand, you're looking for a carefully paced tutorial, you should probably get Randal's nice book,
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Getting Started
We think that Perl is an easy language to learn and use, and we hope to convince you that we're right. One thing that's easy about Perl is that you don't have to say much before you say what you want to say. In many programming languages, you have to declare the types, variables, and subroutines you are going to use before you can write the first statement of executable code. And for complex problems demanding complex data structures, declarations are a good idea. But for many simple, everyday problems, you'd like a programming language in which you can simply say:
print "Howdy, world!\n";
and expect the program to do just that.
Perl is such a language. In fact, this example is a complete program, and if you feed it to the Perl interpreter, it will print "Howdy, world!" on your screen. (The \n in the example produces a newline at the end of the output.)
And that's that. You don't have to say much after you say what you want to say, either. Unlike many languages, Perl thinks that falling off the end of your program is just a normal way to exit the program. You certainly may call the exit function explicitly if you wish, just as you may declare some of your variables, or even force yourself to declare all your variables. But it's your choice. With Perl you're free to do The Right Thing, however you care to define it.
There are many other reasons why Perl is easy to use, but it would be pointless to list them all here, because that's what the rest of the book is for. The devil may be in the details, as they say, but Perl tries to help you out down there in the hot place too. At every level, Perl is about helping you get from here to there with minimum fuss and maximum enjoyment. That's why so many Perl programmers go around with a silly grin on their face.
This chapter is an overview of Perl, so we're not trying to present Perl to the rational side of your brain. Nor are we trying to be complete, or logical. That's what the following chapters are for. Vulcans, androids, and like-minded humans should skip this overview and go straight to Chapter 2, for maximum information density. If, on the other hand, you're looking for a carefully paced tutorial, you should probably get Randal's nice book,
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Natural and Artificial Languages
Languages were first invented by humans, for the benefit of humans. In the annals of computer science, this fact has occasionally been forgotten. Since Perl was designed (loosely speaking) by an occasional linguist, it was designed to work smoothly in the same ways that natural language works smoothly. Naturally, there are many aspects to this, since natural language works well at many levels simultaneously. We could enumerate many of these linguistic principles here, but the most important principle of language design is that easy things should be easy, and hard things should be possible. (Actually, that's two principles.) They may seem obvious to you, but many computer languages fail at one or the other.
Natural languages are good at both because people are continually trying to express both easy things and hard things, so the language evolves to handle both. Perl was designed first of all to evolve, and indeed it has evolved. Many people have contributed to the evolution of Perl over the years. We often joke that a camel is a horse designed by a committee, but if you think about it, the camel is pretty well adapted for life in the desert. The camel has evolved to be relatively self-sufficient. (On the other hand, the camel has not evolved to smell good. Neither has Perl.) This is one of the many strange reasons we picked the camel to be Perl's mascot, but it doesn't have much to do with linguistics.
Now when someone utters the word "linguistics", many folks focus in on one of two things. Either they think of words, or they think of sentences. But words and sentences are just two handy ways to "chunk" speech. Either may be broken down into smaller units of meaning or combined into larger units of meaning. And the meaning of any unit depends heavily on the syntactic, semantic, and pragmatic context in which the unit is located. Natural language has words of various sorts: nouns and verbs and such. If someone says "dog" in isolation, you think of it as a noun, but you can also use the word in other ways. That is, a noun can function as a verb, an adjective, or an adverb when the context demands it. If you dog a dog during the dog days of summer, you'll be a dog tired dogcatcher.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
An Average Example
Suppose you've been teaching a Perl class, and you're trying to figure out how to grade your students. You have a set of exam scores for each member of a class, in random order. You'd like a combined list of all the grades for each student, plus their average score. You have a text file (imaginatively named grades) that looks like this:
Noël 25
Ben 76
Clementine 49
Norm 66
Chris 92
Doug 42
Carol 25
Ben 12
Clementine 0
Norm 66
…
You can use the following script to gather all their scores together, determine each student's average, and print them all out in alphabetical order. This program assumes rather naively that you don't have two Carols in your class. That is, if there is a second entry for Carol, the program will assume it's just another score for the first Carol (not to be confused with the first Noël).
By the way, the line numbers are not part of the program, any other resemblances to BASIC notwithstanding.
 1  #!/usr/bin/perl
 2  
 3  open(GRADES, "grades") or die "Can't open grades: $!\n";
 4  while ($line = <GRADES>) {
 5      ($student, $grade) = split(" ", $line);
 6      $grades{$student} .= $grade . " ";
 7  }
 8 
 9  foreach $student (sort keys %grades) {
10      $scores = 0;
11      $total = 0;    
12      @grades = split(" ", $grades{$student});
13      foreach $grade (@grades) {
14          $total += $grade;
15          $scores++;
16      }
17      $average = $total / $scores;
18      print "$student: $grades{$student}\tAverage: $average\n";
19  }
Now before your eyes cross permanently, we'd better point out that this example demonstrates a lot of what we've covered so far, plus quite a bit more that we'll explain presently. But if you let your eyes go just a little out of focus, you may start to see some interesting patterns. Take some wild guesses now as to what's going on, and then later on we'll tell you if you're right.
We'd tell you to try running it, but you may not know how yet.
Gee, right about now you're probably wondering how to run a Perl program. The short answer is that you feed it to the Perl language interpreter program, which coincidentally happens to be named
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Filehandles
Unless you're using artificial intelligence to model a solipsistic philosopher, your program needs some way to communicate with the outside world. In lines 3 and 4 of our Average Example you'll see the word GRADES, which exemplifies another of Perl's data types, the filehandle. A filehandle is just a name you give to a file, device, socket, or pipe to help you remember which one you're talking about, and to hide some of the complexities of buffering and such. (Internally, filehandles are similar to streams from a language like C++ or I/O channels from BASIC.)
Filehandles make it easier for you to get input from and send output to many different places. Part of what makes Perl a good glue language is that it can talk to many files and processes at once. Having nice symbolic names for various external objects is just part of being a good glue language.
You create a filehandle and attach it to a file by using open. The open function takes at least two parameters: the filehandle and filename you want to associate it with. Perl also gives you some predefined (and preopened) filehandles. STDIN is your program's normal input channel, while STDOUT is your program's normal output channel. And STDERR is an additional output channel that allows your program to make snide remarks off to the side while it transforms (or attempts to transform) your input into your output.
Since you can use the open function to create filehandles for various purposes (input, output, piping), you need to be able to specify which behavior you want. As you might do on the command line, you simply add characters to the filename.
open(SESAME, "filename")               # read from existing file
open(SESAME, "<filename")              #   (same thing, explicitly)
open(SESAME, ">filename")              # create file and write to it
open(SESAME, ">>filename")             # append to existing file
open(SESAME, "| output-pipe-command")  # set up an output filter
open(SESAME, "input-pipe-command |")   # set up an input filter
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Operators
As we alluded to earlier, Perl is also a mathematical language. This is true at several levels, from low-level bitwise logical operations, up through number and set manipulation, on up to larger predicates and abstractions of various sorts. And as we all know from studying math in school, mathematicians love strange symbols. What's worse, computer scientists have come up with their own versions of these strange symbols. Perl has a number of these strange symbols too, but take heart, most are borrowed directly from C, FORTRAN, sed (1) or awk (1), so they'll at least be familiar to users of those languages.
The rest of you can take comfort in knowing that, by learning all these strange symbols in Perl, you've given yourself a head start on all those other strange languages.
Perl's built-in operators may be classified by number of operands into unary, binary, and trinary (or ternary) operators. They may be classified by whether they're prefix operators (which go in front of their operands) or infix operators (which go in between their operands). They may also be classified by the kinds of objects they work with, such as numbers, strings, or files. Later, we'll give you a table of all the operators, but first here are some handy ones to get you started.
Arithmetic operators do what you would expect from learning them in school. They perform some sort of mathematical function on numbers. For example:
ExampleNameResult
$a + $b AdditionSum of $a and $b
$a * $b MultiplicationProduct of $a and $b
$a % $b ModulusRemainder of $a divided by $b
$a ** $b
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Control Structures
So far, except for our one large example, all of our examples have been completely linear; we executed each command in order. We've seen a few examples of using the short-circuit operators to cause a single command to be (or not to be) executed. While you can write some very useful linear programs (a lot of CGI scripts fall into this category), you can write much more powerful programs if you have conditional expressions and looping mechanisms. Collectively, these are known as control structures. So you can also think of Perl as a control language.
But to have control, you have to be able to decide things, and to decide things, you have to know the difference between what's true and what's false.
We've bandied about the term truth, and we've mentioned that certain operators return a true or a false value. Before we go any further, we really ought to explain exactly what we mean by that. Perl treats truth a little differently than most computer languages, but after you've worked with it a while, it will make a lot of sense. (Actually, we hope it'll make a lot of sense after you've read the following.)
Basically, Perl holds truths to be self-evident. That's a glib way of saying that you can evaluate almost anything for its truth value. Perl uses practical definitions of truth that depend on the type of thing you're evaluating. As it happens, there are many more kinds of truth than there are of nontruth.
Truth in Perl is always evaluated in a scalar context. Other than that, no type coercion is done. So here are the rules for the various kinds of values a scalar can hold:
  1. Any string is true except for "" and "0".
  2. Any number is true except for 0.
  3. Any reference is true.
  4. Any undefined value is false.
Actually, the last two rules can be derived from the first two. Any reference (rule 3) would point to something with an address and would evaluate to a number or string containing that address, which is never 0 because it's always defined. And any undefined value (rule 4) would always evaluate to 0 or the null string.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Regular Expressions
Regular expressions (a.k.a. regexes, regexps, or REs) are used by many search programs such as grep and findstr, text-munging programs like sed and awk, and editors like vi and emacs. A regular expression is a way of describing a set of strings without having to list all the strings in your set.
Many other computer languages incorporate regular expressions (some of them even advertise "Perl5 regular expressions"!), but none of these languages integrates regular expressions into the language the way Perl does. Regular expressions are used several ways in Perl. First and foremost, they're used in conditionals to determine whether a string matches a particular pattern, because in a Boolean context they return true and false. So when you see something that looks like /foo/ in a conditional, you know you're looking at an ordinary pattern-matching operator:
if (/Windows 95/) { print "Time to upgrade?\n" }
Second, if you can locate patterns within a string, you can replace them with something else. So when you see something that looks like s/foo/bar/, you know it's asking Perl to substitute "bar" for "foo", if possible. We call that the substitution operator. It also happens to return true or false depending on whether it succeeded, but usually it's evaluated for its side effect:
s/Windows/Linux/;
Finally, patterns can specify not only where something is, but also where it isn't. So the split operator uses a regular expression to specify where the data isn't. That is, the regular expression defines the separators that delimit the fields of data. Our Average Example has a couple of trivial examples of this. Lines 5 and 12 each split strings on the space character in order to return a list of words. But you can split on any separator you can specify with a regular expression:
($good, $bad, $ugly) = split(/,/, "vi,emacs,teco");
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
List Processing
Much earlier in this chapter, we mentioned that Perl has two main contexts, scalar context (for dealing with singular things) and list context (for dealing with plural things). Many of the traditional operators we've described so far have been strictly scalar in their operation. They always take singular arguments (or pairs of singular arguments for binary operators) and always produce a singular result, even in list context. So if you write this:
@array = (1 + 2, 3 - 4, 5 * 6, 7 / 8);
you know that the list on the right side contains exactly four values, because the ordinary math operators always produce scalar values, even in the list context provided by the assignment to an array.
However, other Perl operators can produce either a scalar or a list value, depending on their context. They just "know" whether a scalar or a list is expected of them. But how will you know that? It turns out to be pretty easy to figure out, once you get your mind around a few key concepts.
First, list context has to be provided by something in the "surroundings". In the previous example, the list assignment provides it. Earlier we saw that the list of a foreach loop provides it. The print operator also provides it. But you don't have to learn these one by one.
If you look at the various syntax summaries scattered throughout the rest of the book, you'll see various operators that are defined to take a LIST as an argument. Those are the operators that provide a list context. Throughout this book, LIST is used as a specific technical term to mean "a syntactic construct that provides a list context". For example, if you look up sort, you'll find the syntax summary:
sortLIST
That means that sort provides a list context to its arguments.
Second, at compile time (that is, while Perl is parsing your program and translating to internal opcodes), any operator that takes a LIST provides a list context to each syntactic element of that LIST. So every top-level operator or entity in the LIST knows at compile time that it's supposed to produce the best list it knows how to produce. This means that if you say:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What You Don't Know Won't Hurt You (Much)
Finally, allow us to return once more to the concept of Perl as a natural language. Speakers of a natural language are allowed to have differing skill levels, to speak different subsets of the language, to learn as they go, and generally, to put the language to good use before they know the whole language. You don't know all of Perl yet, just as you don't know all of English. But that's Officially Okay in Perl culture. You can work with Perl usefully, even though we haven't even told you how to write your own subroutines yet. We've scarcely begun to explain how to view Perl as a system management language, or a rapid prototyping language, or a networking language, or an object-oriented language. We could write entire chapters about some of these things. (Come to think of it, we already did.)
But in the end, you must create your own view of Perl. It's your privilege as an artist to inflict the pain of creativity on yourself. We can teach you how we paint, but we can't teach you how you paint. There's More Than One Way To Do It.
Have the appropriate amount of fun.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Bits and Pieces
We're going to start small, so this chapter is about the elements of Perl.
Since we're starting small, the progression through the next several chapters is necessarily from small to large. That is, we take a bottom-up approach, beginning with the smallest components of Perl programs and building them into more elaborate structures, much like molecules are built out of atoms. The disadvantage of this approach is that you don't necessarily get the Big Picture before getting lost in a welter of details. The advantage is that you can understand the examples as we go along. (If you're a top-down person, just turn the book over and read the chapters backward.)
Each chapter does build on the preceding chapter (or the subsequent chapter, if you're reading backward), so you'll need to be careful if you're the sort of person who skips around.
You're certainly welcome to peek at the reference materials toward the end of the book as we go along. (That doesn't count as skipping around.) In particular, any isolated word in typewriter font is likely to be found in Chapter 29. And although we've tried to stay operating-system neutral, if you are unfamiliar with Unix terminology and run into a word that doesn't seem to mean what you think it ought to mean, you should check whether the word is in the Glossary. If the Glossary doesn't work, the index probably will.
Although there are various invisible things going on behind the scenes that we'll explain presently, the smallest things you generally work with in Perl are individual characters. And we do mean characters; historically, Perl freely confused bytes with characters and characters with bytes, but in this new era of global networking, we must be careful to distinguish the two.
Perl may, of course, be written entirely in the 7-bit ASCII character set. Perl also allows you to write in any 8-bit or 16-bit character set, whether it's a national character set or some other legacy character set. However, if you choose to write in one of these older, non-ASCII character sets, you may use non-ASCII characters only within string literals. You are responsible for making sure that the semantics of your program are consistent with the particular national character set you've chosen. For instance, if you're using a 16-bit encoding for an Asian national character set, keep in mind that Perl will generally think of each of your characters as two bytes, not as one character.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Atoms
Although there are various invisible things going on behind the scenes that we'll explain presently, the smallest things you generally work with in Perl are individual characters. And we do mean characters; historically, Perl freely confused bytes with characters and characters with bytes, but in this new era of global networking, we must be careful to distinguish the two.
Perl may, of course, be written entirely in the 7-bit ASCII character set. Perl also allows you to write in any 8-bit or 16-bit character set, whether it's a national character set or some other legacy character set. However, if you choose to write in one of these older, non-ASCII character sets, you may use non-ASCII characters only within string literals. You are responsible for making sure that the semantics of your program are consistent with the particular national character set you've chosen. For instance, if you're using a 16-bit encoding for an Asian national character set, keep in mind that Perl will generally think of each of your characters as two bytes, not as one character.
As described in Chapter 15, we've recently added support for Unicode to Perl. This support is pervasive throughout the language: you can use Unicode characters in identifiers (variable names and such) as well as within literal strings. When you are using Unicode, you don't need to worry about how many bits or bytes it takes to represent a character. Perl just pretends all Unicode characters are the same size (that is, size 1), even though any given character might be represented by multiple bytes internally. Perl normally represents Unicode internally as UTF-8, a variable-length encoding. (For instance, a Unicode smiley character, U-263A, would be represented internally as a three-byte sequence.)
If you'll let us drive our analogy of the physical elements a bit further, characters are atomic in the same sense as the individual atoms of the various elements. Yes, they're composed of smaller particles known as bits and bytes, but if you break a character apart (in a character accelerator, no doubt), the individual bits and bytes lose the distinguishing chemical properties of the character as a whole. Just as neutrons are an implementation detail of the U-238 atom, so too bytes are an implementation detail of the U-263A character.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Molecules
Perl is a free-form language, but that doesn't mean that Perl is totally free of form. As computer folks usually use the term, a free-form language is one in which you can put spaces, tabs, and newlines anywhere you like--except where you can't.
One obvious place you can't put a whitespace character is in the middle of a token. A token is what we call a sequence of characters with a unit of meaning, much like a simple word in natural language. But unlike the typical word, a token might contain other characters besides letters, just as long as they hang together to form a unit of meaning. (In that sense, they're more like molecules, which don't have to be composed of only one particular kind of atom.) For example, numbers and mathematical operators are considered tokens. An identifier is a token that starts with a letter or underscore and contains only letters, digits, and underscores. A token may not contain whitespace characters because this would split the token into two tokens, just as a space in an English word turns it into two words.
Although whitespace is allowed between any two tokens, whitespace is required only between tokens that would otherwise be confused as a single token. All whitespace is equivalent for this purpose. Newlines are distinguished from spaces and tabs only within quoted strings, formats, and certain line-oriented forms of quoting. Specifically, newlines do not terminate statements as they do in certain other languages (such as FORTRAN or Python). Statements in Perl are terminated with semicolons, just as they are in C and its various derivatives.
Unicode whitespace characters are allowed in a Unicode Perl program, but you need to be careful. If you use the special Unicode paragraph and line separators, be aware that Perl may count line numbers differently than your text editor does, so error messages may be more difficult to interpret. It's best to stick with good old-fashioned newlines.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Built-in Data Types
Before we start talking about various kinds of tokens you can build from characters, we need a few more abstractions. To be specific, we need three data types.
Computer languages vary in how many and what kinds of data types they provide. Unlike some commonly used languages that provide many confusing types for similar kinds of values, Perl provides just a few built-in data types. Consider C, in which you might run into char, short, int, long, long long, bool, wchar_t, size_t, off_t, regex_t, uid_t, u_longlong_t, pthread_key_t, fp_exception_field_type, and so on. That's just some of the integer types! Then there are floating-point numbers, and pointers, and strings.
All these complicated types correspond to just one type in Perl: the scalar. (Usually Perl's simple data types are all you need, but if not, you're free to define fancy dynamic types using Perl's object-oriented features--see Chapter 12.) Perl's three basic data types are: scalars, arrays of scalars, and hashes of scalars (also known as associative arrays). Some people may prefer to call these data structures rather than types. That's okay.
Scalars are the fundamental type from which more complicated structures are built. A scalar stores a single, simple value--typically a string or a number. Elements of this simple type may be combined into either of the two aggregate types. An array is an ordered list of scalars that you access with an integer subscript (or index). All indexing in Perl starts at 0. Unlike many programming languages, however, Perl treats negative subscripts as valid: instead of counting from the beginning, negative subscripts count back from the end of whatever it is you're indexing into. (This applies to various substring and sublist operations as well as to regular subscripting.) A hash, on the other hand, is an unordered set of key/value pairs that you access using strings (the keys) as subscripts to look up the scalars (the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Variables
Not surprisingly, there are three variable types corresponding to the three abstract data types we mentioned earlier. Each of these is prefixed by what we call a funny character. Scalar variables are always named with an initial $, even when referring to a scalar that is part of an array or hash. It works a bit like the English word "the". Thus, we have:
ConstructMeaning
$days Simple scalar value $days
$days[28] 29th element of array @days
$days{'Feb'} "Feb" value from hash %days
Note that we can use the same name for $days, @days, and %days without Perl getting confused.
There are other, fancier scalar terms, useful in specialized situations that we won't go into yet. They look like this:
ConstructMeaning
${days} Same as $days but unambiguous before alphanumerics
$Dog::days Different $days variable, in the Dog package
$#days Last index of array @days
$days->[28] 29th element of array pointed to by reference $days
$days[0][2] Multidimensional array
$days{2000}{'Feb'} Multidimensional hash
$days{2000,'Feb'} Multidimensional hash emulation
Entire arrays (or slices of arrays and hashes) are named with the funny character @, which works much like the words "these" or "those":
ConstructMeaning
@days Array containing
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Names
We've talked about storing values in variables, but the variables themselves (their names and their associated definitions) also need to be stored somewhere. In the abstract, these places are known as namespaces. Perl provides two kinds of namespaces, which are often called symbol tables and lexical scopes. You may have an arbitrary number of symbol tables or lexical scopes, but every name you define gets stored in one or the other. We'll explain both kinds of namespaces as we go along. For now we'll just say that symbol tables are global hashes that happen to contain symbol table entries for global variables (including the hashes for other symbol tables). In contrast, lexical scopes are unnamed scratchpads that don't live in any symbol table, but are attached to a block of code in your program. They contain variables that can only be seen by the block. (That's what we mean by a scope). The lexical part just means, "having to do with text", which is not at all what a lexicographer would mean by it. Don't blame us.)
Within any given namespace (whether global or lexical), every variable type has its own subnamespace, determined by the funny character. You can, without fear of conflict, use the same name for a scalar variable, an array, or a hash (or, for that matter, a filehandle, a subroutine name, a label, or your pet llama). This means that $foo and @foo are two different variables. Together with the previous rules, it also means that $foo[1] is an element of @foo totally unrelated to the scalar variable $foo. This may seem a bit weird, but that's okay, because it is weird.
Subroutines may be named with an initial &, although the funny character is optional when calling the subroutine. Subroutines aren't generally considered lvalues, though recent versions of Perl allow you to return an lvalue from a subroutine and assign to that, so it can look as though you're assigning to the subroutine.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Scalar Values
Whether it's named directly or indirectly, and whether it's in a variable, or an array element, or is just a temporary value, a scalar always contains a single value. This value may be a number, a string, or a reference to another piece of data. Or, there might even be no value at all, in which case the scalar is said to be undefined. Although we might speak of a scalar as "containing" a number or a string, scalars are typeless: you are not required to declare your scalars to be of type integer or floating-point or string or whatever.
Perl stores strings as sequences of characters, with no arbitrary constraints on length or content. In human terms, you don't have to decide in advance how long your strings are going to get, and you can include any characters including null bytes within your string. Perl stores numbers as signed integers if possible, or as double-precision floating-point values in the machine's native format otherwise. Floating-point values are not infinitely precise. This is important to remember because comparisons like (10/3 == 1/3*10) tend to fail mysteriously.
Perl converts between the various subtypes as needed, so you can treat a number as a string or a string as a number, and Perl will do the Right Thing. To convert from string to number, Perl internally uses something like the C library's atof (3) function. To convert from number to string, it does the equivalent of an sprintf (3) with a format of "%.14g" on most machines. Improper conversions of a nonnumeric string like foo to a number count as numeric 0; these trigger warnings if you have them enabled, but are silent otherwise. See Chapter 5, for examples of detecting what sort of data a string holds.
Although strings and numbers are interchangeable for nearly all intents, references are a bit different. They're strongly typed, uncastable pointers with built-in reference-counting and destructor invocation. That is, you can use them to create complex data types, including user-defined objects. But they're still scalars, for all that, because no matter how complicated a data structure gets, you often want to treat it as a single value.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Context
Until now we've seen several terms that can produce scalar values. Before we can discuss terms further, though, we must come to terms with the notion of context.
Every operation that you invoke in a Perl script is evaluated in a specific context, and how that operation behaves may depend on the requirements of that context. There are two major contexts: scalar and list. For example, assignment to a scalar variable, or to a scalar element of an array or hash, evaluates the righthand side in a scalar context:
$x         = funkshun();  # scalar context
$x[1]      = funkshun();  # scalar context
$x{"ray"}  = funkshun();  # scalar context
But assignment to an array or a hash, or to a slice of either, evaluates the righthand side in a list context, even if the slice picks out only one element:
@x         = funkshun();  # list context
@x[1]      = funkshun();  # list context
@x{"ray"}  = funkshun();  # list context
%x         = funkshun();  # list context
Assignment to a list of scalars also provides a list context to the righthand side, even if there's only one element in the list:
($x,$y,$z) = funkshun();  # list context
($x)       = funkshun();  # list context
These rules do not change at all when you declare a variable by modifying the term with my or our, so we have:
my $x      = funkshun();  # scalar context
my @x      = funkshun();  # list context
my %x      = funkshun();  # list context
my ($x)    = funkshun();  # list context
You will be miserable until you learn the difference between scalar and list context, because certain operators (such as our mythical funkshun() function above) know which context they are in, and return a list in contexts wanting a list but a scalar value in contexts wanting a scalar. (If this is true of an operation, it will be mentioned in the documentation for that operation.) In computer lingo, the operations are overloaded on their return type. But it's a very simple kind of overloading, based only on the distinction between singular and plural values, and nothing else.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
List Values and Arrays
Now that we've talked about context, we can talk about list literals and how they behave in context. You've already seen some list literals. List literals are denoted by separating individual values by commas (and enclosing the list in parentheses where precedence requires it). Because it (almost) never hurts to use extra parentheses, the syntax diagram of a list value is usually indicated like this:
(LIST)
Earlier we said that LIST in a syntax description indicates something that supplies list context to its arguments, but a bare list literal itself is the one partial exception to that rule, in that it supplies a list context to its arguments only when the list as a whole is in list context. The value of a list literal in list context is just the values of the arguments in the order specified. As a fancy sort of term in an expression, a list literal merely pushes a series of temporary values onto Perl's stack, to be collected off the stack later by whatever operator wants the list.
In a scalar context, however, the list literal doesn't really behave like a LIST, in that it doesn't supply list context to its values. Instead, it merely evaluates each of its arguments in scalar context, and returns the value of the final element. That's because it's really just the C comma operator in disguise, which is a binary operator that always throws away the value on the left and returns the value on the right. In terms of what we discussed earlier, the left side of the comma operator really provides a void context. Because the comma operator is left associative, if you have a series of comma-separated values, you always end up with the last value because the final comma throws away whatever any previous commas produced. So, to contrast the two, the list assignment:
@stuff = ("one", "two", "three");
assigns the entire list value to array @stuff, but the scalar assignment:
$stuff = ("one", "two", "three");
assigns only the value "three" to variable $stuff. Like the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Hashes
As we said earlier, a hash is just a funny kind of array in which you look values up using key strings instead of numbers. A hash defines associations between keys and values, so hashes are often called associative arrays by people who are not lazy typists.
There really isn't any such thing as a hash literal in Perl, but if you assign an ordinary list to a hash, each pair of values in the list will be taken to indicate one key/value association:
%map = ('red',0xff0000,'green',0x00ff00,'blue',0x0000ff);
This has the same effect as:
%map = ();            # clear the hash first
$map{red}   = 0xff0000;
$map{green} = 0x00ff00;
$map{blue}  = 0x0000ff;
It is often more readable to use the => operator between key/value pairs. The => operator is just a synonym for a comma, but it's more visually distinctive and also quotes any bare identifiers to the left of it (just like the identifiers in braces above), which makes it convenient for several sorts of operation, including initializing hash variables:
%map = (
    red   => 0xff0000,
    green => 0x00ff00,
    blue  => 0x0000ff,
);
or initializing anonymous hash references to be used as records:
$rec = {
    NAME  => 'John Smith',
    RANK  => 'Captain',
    SERNO => '951413',
};
or using named parameters to invoke complicated functions:
$field = radio_group(
             NAME      => 'animals',
             VALUES    => ['camel', 'llama', 'ram', 'wolf'],
             DEFAULT   => 'camel',
             LINEBREAK => 'true',
             LABELS    => \%animal_names,
         );
But we're getting ahead of ourselves again. Back to hashes.
You can use a hash variable (%hash) in a list context, in which case it interpolates all its key/value pairs into the list. But just because the hash was initialized in a particular order doesn't mean that the values come back out in that order. Hashes are implemented internally using hash tables for speedy lookup, which means that the order in which entries are stored is dependent on the internal hash function used to calculate positions in the hash table, and not on anything interesting. So the entries come back in a seemingly random order. (The two elements of each key/value pair come out in the right order, of course.) For examples of how to arrange for an output ordering, see the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Typeglobs and Filehandles
Perl uses an special type called a typeglob to hold an entire symbol table entry. (The symbol table entry *foo contains the values of $foo, @foo, %foo, &foo, and several interpretations of plain old foo.) The type prefix of a typeglob is a * because it represents all types.
One use of typeglobs (or references thereto) is for passing or storing filehandles. If you want to save away a filehandle, do it this way:
$fh = *STDOUT;
or perhaps as a real reference, like this:
$fh = \*STDOUT;
This is also the way to create a local filehandle. For example:
sub newopen {
    my $path = shift;
    local *FH;          # not my() nor our()
    open(FH, $path) or return undef;
    return *FH;         # not \*FH!
}
$fh = newopen('/etc/passwd');
See the open function for other ways to generate new filehandles.
The main use of typeglobs nowadays is to alias one symbol table entry to another symbol table entry. Think of an alias as a nickname. If you say:
*foo = *bar;
it makes everything named "foo" a synonym for every corresponding thing named "bar". You can alias just one variable from a typeglob by assigning a reference instead:
*foo = \$bar;
makes $foo an alias for $bar, but doesn't make @foo an alias for @bar, or %foo an alias for %bar. All these affect global (package) variables only; lexicals cannot be accessed through symbol table entries. Aliasing global variables like this may seem like a silly thing to want to do, but it turns out that the entire module export/import mechanism is built around this feature, since there's nothing that says the symbol you're aliasing has to be in your namespace. This:
local *Here::blue = \$There::green;
temporarily makes $Here::blue an alias for $There::green, but doesn't make @Here::blue an alias for @There::green, or %Here::blue an alias for %There::green. Fortunately, all these complicated typeglob manipulations are hidden away where you don't have to look at them. See Section 8.2.4 and Section 8.2.5 in Chapter 8, Section 10.1 in Chapter 10, and Chapter 11, for more discussion on typeglobs and importation.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Input Operators
There are several input operators we'll discuss here because they parse as terms. Sometimes we call them pseudoliterals because they act like quoted strings in many ways. (Output operators like print parse as list operators and are discussed in Chapter 29.)
First of all, we have the command input operator, also known as the backtick operator, because it looks like this:
$info = `finger $user`;
A string enclosed by backticks (grave accents, technically) first undergoes variable interpolation just like a double-quoted string. The result is then interpreted as a command line by the system, and the output of that command becomes the value of the pseudoliteral. (This is modeled after a similar operator in Unix shells.) In scalar context, a single string consisting of all the output is returned. In list context, a list of values is returned, one for each line of output. (You can set $/ to use a different line terminator.)
The command is executed each time the pseudoliteral is evaluated. The numeric status value of the command is saved in $? (see Chapter 28 for the interpretation of $?, also known as $CHILD_ERROR). Unlike the csh version of this command, no translation is done on the return data--newlines remain newlines. Unlike in any of the shells, single quotes in Perl do not hide variable names in the command from interpretation. To pass a $ through to the shell you need to hide it with a backslash. The $user in our finger example above is interpolated by Perl, not by the shell. (Because the command undergoes shell processing, see Chapter 23, for security concerns.)
The generalized form of backticks is qx// (for "quoted execution"), but the operator works exactly the same way as ordinary backticks. You just get to pick your quote characters. As with similar quoting pseudofunctions, if you happen to choose a single quote as your delimiter, the command string doesn't undergo double-quote interpolation;
$perl_info  = qx(ps $$);            # that's Perl's $$
$shell_info = qx'ps $$';            # that's the shell's $$
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Unary and Binary Operators
In the last chapter, we talked about the various kinds of terms you might use in an expression, but to be honest, isolated terms are a bit boring. Many terms are party animals. They like to have relationships with each other. The typical young term feels strong urges to identify with and influence other terms in various ways, but there are many different kinds of social interaction and many different levels of commitment. In Perl, these relationships are expressed using operators.
Sociology has to be good for something.
From a mathematical perspective, operators are just ordinary functions with special syntax. From a linguistic perspective, operators are just irregular verbs. But as any linguist will tell you, the irregular verbs in a language tend to be the ones you use most often. And that's important from an information theory perspective because the irregular verbs tend to be shorter and more efficient in both production and recognition.
In practical terms, operators are handy.
Operators come in various flavors, depending on their arity (how many operands they take), their precedence (how hard they try to take those operands away from surrounding operators), and their associativity (whether they prefer to do things right to left or left to right when associated with operators of the same precedence).
Perl operators come in three arities: unary, binary, and trinary (or ternary, if your native tongue is Shibboleth). Unary operators are always prefix operators (except for the postincrement and postdecrement operators). The others are all infix operators--unless you count the list operators, which can prefix any number of arguments. But most people just think of list operators as normal functions that you can forget to put parentheses around. Here are some examples:
! $x                # a unary operator
$x * $y             # a binary operator
$x ? $y : $z        # a trinary operator
print $x, $y, $z    # a list operator
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Terms and List Operators (Leftward)
Any term is of highest precedence in Perl. Terms include variables, quote and quotelike operators, most expressions in parentheses, or brackets or braces, and any function whose arguments are parenthesized. Actually, there aren't really any functions in this sense, just list operators and unary operators behaving as functions because you put parentheses around their arguments. Nevertheless, the name of Chapter 29 is Functions.
Now listen carefully. Here are a couple of rules that are very important and simplify things greatly, but may occasionally produce counterintuitive results for the unwary. If any list operator (such as print) or any named unary operator (such as chdir) is followed by a left parenthesi