Buy this Book
PDF $27.99 Read it Now!
Reprint Licensing

Learning Perl
Learning Perl, Third Edition Making Easy Things Easy and Hard Things Possible

By Randal L. Schwartz, Tom Phoenix

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction
Welcome to the Llama book!
This is the third edition of a book that has been enjoyed by half a million readers since 1993. At least, we hope they've enjoyed it. It's a sure thing that we've enjoyed writing it.
You probably have some questions about Perl, and maybe even some about this book; especially if you've already flipped through the book to see what's coming. So we'll use this chapter to answer them.
If you're anything like us, you're probably standing in a bookstore right now, wondering whether you should get this Llama book and learn Perl, or maybe that book over there and learn some language named after a snake, or a beverage, or a letter of the alphabet. You've got about two minutes before the bookstore manager comes over to tell you that this isn't a library, and you need to buy something or get out. Maybe you want to use these two minutes to see a quick Perl program, so you'll know something about how powerful Perl is and what it can do. In that case, you should check out the whirlwind tour of Perl, later in this chapter.
Thank you for noticing. There are a lot of footnotes in this book. Ignore them. They're needed because Perl is chock-full of exceptions to its rules. This is a good thing, as real life is chock-full of exceptions to rules.
But it means that we can't honestly say, "The fizzbin operator frobnicates the hoozistatic variables" without a footnote giving the exceptions. We're pretty honest, so we have to write the footnotes. But you can be honest without reading them. (It's funny how that works out.)
Many of the exceptions have to do with portability. Perl began on Unix systems, and it still has deep roots in Unix. But wherever possible, we've tried to show when something may behave unexpectedly, whether that's because it's running on a non-Unix system, or for another reason. We hope that readers who know nothing about Unix will nevertheless find this book a good introduction to Perl. (And they'll learn a little about Unix along the way, at no extra charge.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Questions and Answers
You probably have some questions about Perl, and maybe even some about this book; especially if you've already flipped through the book to see what's coming. So we'll use this chapter to answer them.
If you're anything like us, you're probably standing in a bookstore right now, wondering whether you should get this Llama book and learn Perl, or maybe that book over there and learn some language named after a snake, or a beverage, or a letter of the alphabet. You've got about two minutes before the bookstore manager comes over to tell you that this isn't a library, and you need to buy something or get out. Maybe you want to use these two minutes to see a quick Perl program, so you'll know something about how powerful Perl is and what it can do. In that case, you should check out the whirlwind tour of Perl, later in this chapter.
Thank you for noticing. There are a lot of footnotes in this book. Ignore them. They're needed because Perl is chock-full of exceptions to its rules. This is a good thing, as real life is chock-full of exceptions to rules.
But it means that we can't honestly say, "The fizzbin operator frobnicates the hoozistatic variables" without a footnote giving the exceptions. We're pretty honest, so we have to write the footnotes. But you can be honest without reading them. (It's funny how that works out.)
Many of the exceptions have to do with portability. Perl began on Unix systems, and it still has deep roots in Unix. But wherever possible, we've tried to show when something may behave unexpectedly, whether that's because it's running on a non-Unix system, or for another reason. We hope that readers who know nothing about Unix will nevertheless find this book a good introduction to Perl. (And they'll learn a little about Unix along the way, at no extra charge.)
And many of the other exceptions have to do with the old "80/20" rule. By that we mean that 80% of the behavior of Perl can be described in 20% of the documentation, and the other 20 percent of the behavior takes up the other 80% of the documentation. So to keep this book small, we'll talk about the most common, easy-to-talk-about behavior in the main text, and hint in the direction of the other stuff in the footnotes (which are in a smaller font, so we can say more in the same space).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Does "Perl" Stand For?
Perl is short for " Practical Extraction and Report Language," although it has also been called a "Pathologically Eclectic Rubbish Lister," among other expansions. There's no point in arguing which expansion is correct, because both of those are endorsed by Larry Wall, Perl's creator and chief architect, implementor, and maintainer. He created Perl in the mid-1980s when he was trying to produce some reports from a Usenet-news-like hierarchy of files for a bug-reporting system, and awk ran out of steam. Larry, being the lazy programmer that he is, decided to overkill the problem with a general-purpose tool that he could use in at least one other place. The result was Perl version zero.
There's no shortage of computer languages, is there? But, at the time, Larry didn't see anything that really met his needs. If one of the other languages of today had been available back then, perhaps Larry would have used one of those. He needed something with the quickness of coding available in shell or awk programming, and with some of the power of more advanced tools like grep, cut, sort, and sed, without having to resort to a language like C.
Perl tries to fill the gap between low-level programming (such as in C or C++ or assembly) and high-level programming (such as "shell" programming). Low-level programming is usually hard to write and ugly, but fast and unlimited; it's hard to beat the speed of a well-written low-level program on a given machine. And there's not much you can't do there. High-level programming, at the other extreme, tends to be slow, hard, ugly, and limited; there are many things you can't do at all with the shell, if there's no command on your system that provides the needed functionality. Perl is easy, nearly unlimited, mostly fast, and kind of ugly.
Let's take another look at those four claims we just made about Perl:
First, Perl is easy. As you'll see, though, this means it's easy to
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How Can I Get Perl?
You probably already have it. At least, we find Perl wherever we go. It ships with many systems, and system administrators often install it on every machine at their site. But if you can't find it already on your system, you can still get it for free.
Perl is distributed under two different licenses. For most people, since you'll merely be using it, either license is as good as the other. If you'll be modifying Perl, however, you'll want to read the licenses more closely, because they put some small restrictions on distributing the modified code. For people who won't modify Perl, the licenses essentially say "it's free—have fun with it."
In fact, it's not only free, but it runs rather nicely on nearly everything that calls itself Unix and has a C compiler. You download it, type a command or two, and it starts configuring and building itself. Or, better yet, you get your system administrator to type those two commands and install it for you.
Besides Unix and Unix-like systems, people have also been addicted enough to Perl to port it to other systems, like the Macintosh, VMS, OS/2, even MS/DOS and every modern species of Windows—and probably even more by the time you read this. Many of these ports of Perl come with an installation program that's even easier to use than the process for installing Perl on Unix. Check for links in the "ports" section on CPAN.
CPAN is the Comprehensive Perl Archive Network, your one-stop shopping for Perl. It has the source code for Perl itself, ready-to-install ports of Perl to all sorts of non-Unix systems, examples, documentation, extensions to Perl, and archives of messages about Perl. In short, CPAN is comprehensive.
CPAN is replicated on hundreds of mirror machines around the world; start at http://www.cpan.org/ to find one near you. Most of the time, you can also simply visit http://COUNTRYCODE.cpan.org/ where COUNTRYCODE is your two-letter official country code (like on the end of your national domain names). Or, if you don't have access to the Net, you might find a CD-ROM or DVD-ROM with all of the useful parts of CPAN on it; check with your local technical bookstore. Look for a recently minted archive, though; since CPAN changes daily, an archive from two years ago is an antique. (Better yet, get a kind friend with Net access to burn you one with today's CPAN.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How Do I Make a Perl Program?
It's about time you asked (even if you didn't). Perl programs are text files; you can create and edit them with your favorite text editor. (You don't need any special development environment, although there are some commercial ones available from various vendors. We've never used any of these enough to recommend them.)
You should generally use a programmers' text editor, rather than an ordinary editor. What's the difference? Well, a programmers' text editor will let you do things that programmers need, like to indent or unindent a block of code, or to find the matching closing curly brace for a given opening curly brace. On Unix systems, the two most popular programmers' editors are emacs and vi (and their variants and clones). Both of these have been ported to several non-Unix systems, and many systems today offer a graphical editor (which uses a pointing device like a mouse). In fact, there are even versions of vi and emacs that offer a graphical interface. Ask your local expert about text editors on your system.
For the simple programs you'll be writing for the exercises in this book, none of which will need to be more than about twenty or thirty lines of code, any text editor will be fine.
A few beginners try to use a word processor instead of a text editor. We recommend against this—it's inconvenient at best and impossible at worst. But we won't try to stop you. Be sure to tell the word processor to save your file as "text only"; the word processor's own format will almost certainly be unusable.
In some cases, you may need to compose the program on one machine, then transfer it to another to be run. If you do this, be sure that the transfer uses "text" or "ASCII" mode, and not "binary" mode. This step is needed because of the different text formats on different machines. Without that, you may get inconsistent results—some versions of Perl actually abort when they detect a mismatch in the line endings.
According to the oldest rule in the book, any book about a computer language that has Unix-like roots has to start with showing the "Hello, world" program. So, here it is in Perl:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Whirlwind Tour of Perl
So, you want to see a real Perl program with some meat? (If you don't, just play along for now.) Here you are:
#!/usr/bin/perl
@lines = `perldoc -u -f atan2`;
foreach (@lines) {
  s/\w<([^>]+)>/\U$1/g;
  print;
}
Now, the first time you see Perl code like this, it can seem pretty strange. (In fact, every time you see Perl code like this, it can seem pretty strange.) But let's take it line by line, and see what this example does. (These explanations are very brief; this is a whirlwind tour, after all. We'll see all of this program's features in more detail during the rest of this book. You're not really supposed to understand the whole thing until later.)
The first line is the #! line, as we saw before. You might need to change that line for your system, as we discussed earlier.
The second line runs an external command, named within backquotes ("` `"). (The backquote key is often found next to the number 1 on full-sized American keyboards. Be sure not to confuse the backquote with the single quote, "'".) The command we're using is perldoc -u -f atan2; try typing that at your command line to see what its output looks like. The perldoc command is used on most systems to read and display the documentation for Perl and its associated extensions and utilities, so it should normally be available. This command tells you something about the trigonometric function atan2; we're using it here just as an example of an external command whose output we wish to process.
The output of that command in the backticks is saved in an array variable called @lines . The next line of code starts a loop that will process each one of those lines. Inside the loop, the statements are indented. Although Perl doesn't require this, good programmers do.
The first line inside the loop body is the scariest one; it says s/\w<([^>]+)>/\U$1/g;. Without going into too much detail, we'll just say that this can change any line that has a special marker made with angle brackets (
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Exercises
Normally, each chapter will end with some exercises, with the answers in Appendix A. But in this chapter, the answers were already provided.
If you can't get these exercises to work on your machine, double-check your work and then consult your local expert. Remember that you may need to change each program a little, as described in the text.
  1. [7] Type in the "Hello, world" program and get it to work! (You may name it anything you wish, but a good name might be ex1-1, for simplicity, since it's exercise 1 in Chapter 1.)
  2. [5] Type the command perldoc -u -f atan2 at a command prompt and note its output. If you can't get that to work, then find out from a local administrator or the documentation for your version of Perl about how to invoke perldoc or its equivalent. (You'll need this for the next exercise anyway.)
  3. [6] Type in the second example program (from the previous section) and see what it prints. (Hint: Be careful to type those punctuation marks exactly as shown!) Do you see how it changed the output of the command?
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Scalar Data
In English, as in many other spoken languages, we're used to distinguishing between singular and plural. As a computer language designed by a human linguist, Perl is similar. As a general rule, when Perl has just one of something, that's a scalar.
A scalar is the simplest kind of data that Perl manipulates. Most scalars are either a number (like 255 or 3.25e20) or a string of characters (like hello or the Gettysburg Address). Although you may think of numbers and strings as very different things, Perl uses them nearly interchangeably.
A scalar value can be acted upon with operators (like addition or concatenate), generally yielding a scalar result. A scalar value can be stored into a scalar variable. Scalars can be read from files and devices, and can be written out as well.
Although a scalar is most often either a number or a string, it's useful to look at numbers and strings separately for the moment. We'll cover numbers first, and then move on to strings.
As you'll see in the next few paragraphs, you can specify both integers (whole numbers, like 255 or 2001) and floating-point numbers (real numbers with decimal points, like 3.14159, or 1.35 x 1025). But internally, Perl computes with double-precision floating-point values. This means that there are no integer values internal to Perl—an integer constant in the program is treated as the equivalent floating-point value. You probably won't notice the conversion (or care much), but you should stop looking for distinct integer operations (as opposed to floating-point operations), because there aren't any.
A literal is the way a value is represented in the source code of the Perl program. A literal is not the result of a calculation or an I/O operation; it's data written directly into the source code.
Perl's floating-point literals should look familiar to you. Numbers with and without decimal points are allowed (including an optional plus or minus prefix), as well as tacking on a power-of-10 indicator (exponential notation) with E notation. For example:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Is Scalar Data?
In English, as in many other spoken languages, we're used to distinguishing between singular and plural. As a computer language designed by a human linguist, Perl is similar. As a general rule, when Perl has just one of something, that's a scalar.
A scalar is the simplest kind of data that Perl manipulates. Most scalars are either a number (like 255 or 3.25e20) or a string of characters (like hello or the Gettysburg Address). Although you may think of numbers and strings as very different things, Perl uses them nearly interchangeably.
A scalar value can be acted upon with operators (like addition or concatenate), generally yielding a scalar result. A scalar value can be stored into a scalar variable. Scalars can be read from files and devices, and can be written out as well.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Numbers
Although a scalar is most often either a number or a string, it's useful to look at numbers and strings separately for the moment. We'll cover numbers first, and then move on to strings.
As you'll see in the next few paragraphs, you can specify both integers (whole numbers, like 255 or 2001) and floating-point numbers (real numbers with decimal points, like 3.14159, or 1.35 x 1025). But internally, Perl computes with double-precision floating-point values. This means that there are no integer values internal to Perl—an integer constant in the program is treated as the equivalent floating-point value. You probably won't notice the conversion (or care much), but you should stop looking for distinct integer operations (as opposed to floating-point operations), because there aren't any.
A literal is the way a value is represented in the source code of the Perl program. A literal is not the result of a calculation or an I/O operation; it's data written directly into the source code.
Perl's floating-point literals should look familiar to you. Numbers with and without decimal points are allowed (including an optional plus or minus prefix), as well as tacking on a power-of-10 indicator (exponential notation) with E notation. For example:
1.25
255.000
255.0
7.25e45  # 7.25 times 10 to the 45th power (a big number)
-6.5e24  # negative 6.5 times 10 to the 24th
         # (a big negative number)
-12e-24  # negative 12 times 10 to the -24th
         # (a very small negative number)
-1.2E-23 # another way to say that - the E may be uppercase
Integer literals are also straightforward, as in:
0
2001
-40
255
61298040283768
That last one is a little hard to read. Perl allows underscores for clarity within integer literals, so we can also write that number like this:
61_298_040_283_768
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Strings
Strings are sequences of characters (like hello). Strings may contain any combination of any characters.
The shortest possible string has no characters. The longest string fills all of your available memory (although you wouldn't be able to do much with that). This is in accordance with the principle of "no built-in limits" that Perl follows at every opportunity. Typical strings are printable sequences of letters and digits and punctuation in the ASCII 32 to ASCII 126 range. However, the ability to have any character in a string means you can create, scan, and manipulate raw binary data as strings—something with which many other utilities would have great difficulty. For example, you could update a graphical image or compiled program by reading it into a Perl string, making the change, and writing the result back out.
Like numbers, strings have a literal representation, which is the way you represent the string in a Perl program. Literal strings come in two different flavors: single-quoted string literals and double-quoted string literals.
A single-quoted string literal is a sequence of characters enclosed in single quotes. The single quotes are not part of the string itself—they're just there to let Perl identify the beginning and the ending of the string. Any character other than a single quote or a backslash between the quote marks (including newline characters, if the string continues onto successive lines) stands for itself inside a string. To get a backslash, put two backslashes in a row, and to get a single quote, put a backslash followed by a single quote. In other words:
'fred'    # those four characters: f, r, e, and d
'barney'  # those six characters
''        # the null string (no characters)
'Don\'t let an apostrophe end this string prematurely!'
'the last character of this string is a backslash: \\'
'hello\n' # hello followed by backslash followed by n
'hello
there'    # hello, newline, there (11 characters total)
'\'\\'    # single quote followed by backslash
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Perl's Built-in Warnings
Perl can be told to warn you when it sees something suspicious going on in your program. To run your program with warnings turned on, use the -w option on the command line:
$ perl -w 
            my_program
         
Or, if you always want warnings, you may request them on the #! line:
#!/usr/bin/perl -w
That works even on non-Unix systems, where it's traditional to write something like this, since the path to Perl doesn't generally matter:
#!perl -w
Now, Perl will warn you if you use '12fred34' as if it were a number:
Argument "12fred34" isn't numeric
Of course, warnings are generally meant for programmers, not for end-users. If the warning won't be seen by a programmer, it probably won't do any good. And warnings won't change the behavior of your program, except that now it will emit gripes once in a while. If you get a warning message you don't understand, look for its explanation in the perldiag manpage.
Warnings change from one version of Perl to the next. This may mean that your well-tuned program runs silently when warnings are on today, but not when it's used with a newer (or older) version of Perl. To help with this situation, version 5.6 of Perl introduces lexical warnings . These are warnings that may be turned on or off in different sections of code, providing more detailed control than the single -w switch could. See the perllexwarn manpage for more information on these warnings.
As we run across situations in which Perl will usually be able to warn you about a mistake in your code, we'll point them out. But you shouldn't count on the text or behavior of any warning staying exactly the same in future Perl releases.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Scalar Variables
A variable is a name for a container that holds one or more values. The name of the variable stays the same throughout the program, but the value or values contained in that variable typically change over and over again throughout the execution of the program.
A scalar variable holds a single scalar value, as you'd expect. Scalar variable names begin with a dollar sign followed by what we'll call a Perl identifier: a letter or underscore, and then possibly more letters, or digits, or underscores. Another way to think of it is that it's made up of alphanumerics and underscores, but can't start with a digit. Uppercase and lowercase letters are distinct: the variable $Fred is a different variable from $fred. And all of the letters, digits, and underscores are significant, so:
$a_very_long_variable_that_ends_in_1
is different from:
$a_very_long_variable_that_ends_in_2
Scalar variables in Perl are always referenced with the leading $. In the shell, you use $ to get the value, but leave the $ off to assign a new value. In awk or C, you leave the $ off entirely. If you bounce back and forth a lot, you'll find yourself typing the wrong things occasionally. This is expected. (Most Perl programmers would recommend that you stop writing shell, awk, and C programs, but that may not work for you.)
You should generally select variable names that mean something regarding the purpose of the variable. For example, $r is probably not very descriptive but $line_length is. A variable used for only two or three lines close together may be called something simple, like $n, but a variable used throughout a program should probably have a more descriptive name.
Similarly, properly placed underscores can make a name easier to read and understand, especially if your maintenance programmer has a different spoken language background than you have. For example, $super_bowl is a better name than $superbowl, since that last one might look like $superb_owl
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Output with print
It's generally a good idea to have your program produce some output; otherwise, someone may think it didn't do anything. The print( ) operator makes this possible. It takes a scalar argument and puts it out without any embellishment onto standard output. Unless you've done something odd, this will be your terminal display. For example:
print "hello world\n"; # say hello world, followed by a newline

print "The answer is ";
print 6 * 7;
print ".\n";
You can actually give print a series of values, separated by commas.
print "The answer is ", 6 * 7, ".\n";
This is actually a list, but we haven't talked about lists yet, so we'll put that off for later.
When a string literal is double-quoted, it is subject to variable interpolation (besides being checked for backslash escapes). This means that any scalar variable name in the string is replaced with its current value. For example:
$meal = "brontosaurus steak";
$barney = "fred ate a $meal";    # $barney is now "fred ate a brontosaurus steak"
$barney = 'fred ate a ' . $meal; # another way to write that
As you see on the last line above, you can get the same results without the double quotes. But the double-quoted string is often the more convenient way to write it.
If the scalar variable has never been given a value, the empty string is used instead:
$barney = "fred ate a $meat"; # $barney is now "fred ate a "
Don't bother with interpolating if you have just the one lone variable:
print "$fred"; # unneeded quote marks
print $fred;   # better style
There's nothing really wrong with putting quote marks around a lone variable, but the other programmers will laugh at you behind your back.
Variable interpolation is also known as double-quote interpolation , because it happens when double-quote marks (but not single quotes) are used. It happens for some other strings in Perl, which we'll mention as we get to them.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The if Control Structure
Once you can compare two values, you'll probably want your program to make decisions based upon that comparison. Like all similar languages, Perl has an if control structure:
if ($name gt 'fred') {
  print "'$name' comes after 'fred' in sorted order.\n";
}
If you need an alternative choice, the else keyword provides that as well:
if ($name gt 'fred') {
  print "'$name' comes after 'fred' in sorted order.\n";
} else {
  print "'$name' does not come after 'fred'.\n";
  print "Maybe it's the same string, in fact.\n";
}
Unlike in C, those block curly braces are required around the conditional code. It's a good idea to indent the contents of the blocks of code as we show here; that makes it easier to see what's going on. If you're using a programmers' text editor (as discussed in Chapter 1), it'll do most of the work for you.
You may actually use any scalar value as the conditional of the if control structure. That's handy if you want to store a true or false value into a variable, like this:
$is_bigger = $name gt 'fred';
if ($is_bigger) { ... }
But how does Perl decide whether a given value is true or false? Perl doesn't have a separate Boolean data type, like some languages have. Instead, it uses a few simple rules:
  1. The special value undef is false. (We'll see this a little later in this section.)
  2. Zero is false; all other numbers are true.
  3. The empty string ('') is false; all other strings are normally true.
  4. The one exception: since numbers and strings are equivalent, the string form of zero, '0', has the same value as its numeric form: false.
So, if your scalar value is undef, 0, '', or '0', it's false. All other scalars are true—including all of the types of scalars that we haven't told you about yet.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Getting User Input
At this point, you're probably wondering how to get a value from the keyboard into a Perl program. Here's the simplest way: use the line-input operator, <STDIN> . Each time you use <STDIN> in a place where a scalar value is expected, Perl reads the next complete text line from standard input (up to the first newline), and uses that string as the value of <STDIN>. Standard input can mean many things, but unless you do something uncommon, it means the keyboard of the user who invoked your program (probably you). If there's nothing waiting to be read (typically the case, unless you type ahead a complete line), the Perl program will stop and wait for you to enter some characters followed by a newline (return).
The string value of <STDIN> typically has a newline character on the end of it. So you could do something like this:
$line = <STDIN>;
if ($line eq "\n") {
  print "That was just a blank line!\n";
} else {
  print "That line of input was: $line";
}
But in practice, you don't often want to keep the newline, so you need the chomp operator.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The chomp Operator
The first time you read about the chomp operator, it seems terribly overspecialized. It works on a variable. The variable has to hold a string. And if the string ends in a newline character, chomp can get rid of the newline. That's (nearly) all it does. For example:
$text = "a line of text\n"; # Or the same thing from <STDIN>
chomp($text);               # Gets rid of the newline character
But it turns out to be so useful, you'll put it into nearly every program you write. As you see, it's the best way to remove a trailing newline from a string in a variable. In fact, there's an easier way to use chomp, because of a simple rule: any time that you need a variable in Perl, you can use an assignment instead. First, Perl does the assignment. Then it uses the variable in whatever way you requested. So the most common use of chomp looks like this:
chomp($text = <STDIN>); # Read the text, without the newline character

$text = <STDIN>;        # Do the same thing...
chomp($text);           # ...but in two steps
At first glance, the combined chomp may not seem to be the easy way, especially if it seems more complex! If you think of it as two operations—read a line, then chomp it—then it's more natural to write it as two statements. But if you think of it as one operation—read just the text, not the newline—it's more natural to write the one statement. And since most other Perl programmers are going to write it that way, you may as well get used to it now.
chomp is actually a function. As a function, it has a return value, which is the number of characters removed. This number is hardly ever useful:
$food = <STDIN>;
$betty = chomp $food; # gets the value 1 - but we knew that!
As you see, you may write chomp with or without the parentheses. This is another general rule in Perl: except in cases where it changes the meaning to remove them, parentheses are always optional.
If a line ends with two or more newlines, chomp removes only one. If there's no newline, it does nothing, and returns zero.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The while Control Structure
Like most algorithmic programming languages, Perl has a number of looping structures. The while loop repeats a block of code as long as a condition is true:
$count = 0;
while ($count < 10) {
  $count += 1;
  print "count is now $count\n"; # Gives values from 1 to 10
}
As always in Perl, the truth value here works like the truth value in the if test. Also like the if control structure, the block curly braces are required. The conditional expression is evaluated before the first iteration, so the loop may be skipped completely, if the condition is initially false.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The undef Value
What happens if you use a scalar variable before you give it a value? Nothing serious, and definitely nothing fatal. Variables have the special undef value before they are first assigned, which is just Perl's way of saying "nothing here to look at—move along, move along." If you try to use this "nothing" as a "numeric something," it acts like 0. If you try to use it as a "string something," it acts like the empty string. But undef is neither a number nor a string; it's an entirely separate kind of scalar value.
Because undef automatically acts like zero when used as a number, it's easy to make an numeric accumulator that starts out empty:
# Add up some odd numbers
$n = 1;
while ($n < 10) {
  $sum += $n;
  $n += 2; # On to the next odd number
}
print "The total was $sum.\n";
This works properly when $sum was undef before the loop started. The first time through the loop, $n is one, so the first line inside the loop adds one to $sum. That's like adding one to a variable that already holds zero (because we're using undef as if it were a number). So now it has the value 1. After that, since it's been initialized, adding works in the traditional way.
Similarly, you could have a string accumulator that starts out empty:
$string .= "more text\n";
If $string is undef, this will act as if it already held the empty string, putting "more text\n" into that variable. But if it already holds a string, the new text is simply appended.
Perl programmers frequently use a new variable in this way, letting it act as either zero or the empty string as needed.
Many operators return undef when the arguments are out of range or don't make sense. If you don't do anything special, you'll get a zero or a null string without major consequences. In practice, this is hardly a problem. In fact, most programmers will rely upon this behavior. But you should know that when warnings are turned on, Perl will typically warn about unusual uses of the undefined value, since that may indicate a bug. For example, simply copying
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The defined Function
One operator that can return undef is the line-input operator, <STDIN> . Normally, it will return a line of text. But if there is no more input, such as at end-of-file, it returns undef to signal this. To tell whether a value is undef and not the empty string, use the defined function, which returns false for undef, and true for everything else:
$madonna = <STDIN>;
if ( defined($madonna) ) {
  print "The input was $madonna";
} else {
  print "No input available!\n";
}
If you'd like to make your own undef values, you can use the obscurely named undef operator:
$madonna = undef; # As if it had never been touched
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Exercises
See Section A.1 for answers to the following exercises:
  1. [5] Write a program that computes the circumference of a circle with a radius of 12.5. Circumference is 2π times the radius (approximately 2 times 3.141592654). The answer you get should be about 78.5.
  2. [4] Modify the program from the previous exercise to prompt for and accept a radius from the person running the program. So, if the user enters 12.5 for the radius, she should get the same number as in the previous exercise.
  3. [4] Modify the program from the previous exercise so that, if the user enters a number less than zero, the reported circumference will be zero, rather than negative.
  4. [8] Write a program that prompts for and reads two numbers (on separate lines of input) and prints out the product of the two numbers multiplied together.
  5. [8] Write a program that prompts for and reads a string and a number (on separate lines of input) and prints out the string the number of times indicated by the number on separate lines. (Hint: Use the "x" operator.) If the user enters "fred" and "3," the output should be three lines, each saying "fred". If the user enters "fred" and "299792," there may be a lot of output.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Lists and Arrays
If a scalar was the "singular" in Perl, as we described them at the beginning of Chapter 2, the "plural" in Perl is represented by lists and arrays.
A list is an ordered collection of scalars. An array is a variable that contains a list. In Perl, the two terms are often used as if they're interchangeable. But, to be accurate, the list is the data, and the array is the variable. You can have a list value that isn't in an array, but every array variable holds a list (although that list may be empty). Figure 3-1 represents a list, whether it's stored in an array or not.
Figure 3-1: A list with five elements
Each element of an array or list is a separate scalar variable with an independent scalar value. These values are ordered—that is, they have a particular sequence from the first to the last element. The elements of an array or list are indexed by small integers starting at zero and counting by ones, so the first element of any array or list is always element zero.
Since each element is an independent scalar value, a list or array may hold numbers, strings, undef values, or any mixture of different scalar values. Nevertheless, it's most common to have all elements of the same type, such as a list of book titles (all strings) or a list of cosines (all numbers).
Arrays and lists can have any number of elements. The smallest one has no elements, while the largest can fill all of available memory. Once again, this is in keeping with Perl's philosophy of "no unnecessary limits."
If you've used arrays in another language, you won't be surprised to find that Perl provides a way to subscript an array in order to refer to an element by a numeric index.
The array elements are numbered using sequential integers, beginning at zero and increasing by one for each element, like this:
$fred[0] = "yabba";
$fred[1] = "dabba";
$fred[2] = "doo";
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Accessing Elements of an Array
If you've used arrays in another language, you won't be surprised to find that Perl provides a way to subscript an array in order to refer to an element by a numeric index.
The array elements are numbered using sequential integers, beginning at zero and increasing by one for each element, like this:
$fred[0] = "yabba";
$fred[1] = "dabba";
$fred[2] = "doo";
The array name itself (in this case, "fred") is from a completely separate namespace than scalars use; you could have a scalar variable named $fred in the same program, and Perl will treat them as different things, and wouldn't be confused. (Your maintenance programmer might be confused, though, so don't capriciously make all of your variable names the same!)
You can use an array element like $fred[2] in every place where you could use any other scalar variable like $fred. For example, you can get the value from an array element or change that value by the same sorts of expressions we used in the previous chapter:
print $fred[0];
$fred[2] = "diddley";
$fred[1] .= "whatsis";
Of course, the subscript may be any expression that gives a numeric value. If it's not an integer already, it'll automatically be truncated to the next lower integer:
$number = 2.71828;
print $fred[$number - 1]; # Same as printing $fred[1]
If the subscript indicates an element that would be beyond the end of the array, the corresponding value will be undef. This is just as with ordinary scalars; if you've never stored a value into the variable, it's undef.
$blank = $fred[ 142_857 ]; # unused array element gives undef
$blanc = $mel;             # unused scalar $mel also gives undef
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Special Array Indices
If you store into an array element that is beyond the end of the array, the array is automatically extended as needed—there's no limit on its length, as long as there's available memory for Perl to use. If intervening elements need to be created, they'll be created as undef values.
$rocks[0] = 'bedrock';      # One element...
$rocks[1] = 'slate';        # another...
$rocks[2] = 'lava';         # and another...
$rocks[3] = 'crushed rock'; # and another...
$rocks[99] = 'schist';      # now there are 95 undef elements
Sometimes, you need to find out the last element index in an array. For the array of rocks that we've just been using, the last element index is $#rocks. That's not the same as the number of elements, though, because there's an element number zero. As seen in the code snippet below, it's actually possible to assign to this value to change the size of the array, although this is rare in practice.
$end = $#rocks;                  # 99, which is the last element's index
$number_of_rocks = $end + 1;     # okay, but we'll see a better way later
$#rocks = 2;                     # Forget all rocks after 'lava'
$#rocks = 99;                    # add 97 undef elements (the forgotten rocks are
                                 # gone forever)
$rocks[ $#rocks ] = 'hard rock'; # the last rock
Using the $#name value as an index, like that last example, happens often enough that Larry has provided a shortcut: negative array indices count from the end of the array. But don't get the idea that these indices "wrap around." If you've got three elements in the array, the valid negative indices are -1 (the last element), -2 (the middle element), and -3 (the first element). In the real world, nobody seems to use any of these except -1, though.
$rocks[ -1 ] = 'hard rock'; # easier way to do that last example above
$dead_rock = $rocks[-100];  # gets 'bedrock'
$rocks[ -200 ] = 'crystal'; # fatal error!
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
List Literals
An array (the way you represent a list value within your program) is a list of comma-separated values enclosed in parentheses. These values form the elements of the list. For example:
(1, 2, 3)      # list of three values 1, 2, and 3
(1, 2, 3,)     # the same three values (the trailing comma is ignored)
("fred", 4.5)  # two values, "fred" and 4.5
( )             # empty list - zero elements
(1..100)       # list of 100 integers
That last one uses the .. range operator, which is seen here for the first time. That operator creates a list of values by counting from the left scalar up to the right scalar by ones. For example:
(1..5)            # same as (1, 2, 3, 4, 5)
(1.7..5.7)        # same thing - both values are truncated
(5..1)            # empty list - .. only counts "uphill"
(0, 2..6, 10, 12) # same as (0, 2, 3, 4, 5, 6, 10, 12)
($a..$b)          # range determined by current values of $a and $b
(0..$#rocks)      # the indices of the rocks array from the previous section
As you can see from those last two items, the elements of an array are not necessarily constants—they can be expressions that will be newly evaluated each time the literal is used. For example:
($a, 17)       # two values: the current value of $a, and 17
($b+$c, $d+$e) # two values
Of course, a list may have any scalar values, like this typical list of strings:
("fred", "barney", "betty", "wilma", "dino")
It turns out that lists of simple words (like the previous example) are frequently needed in Perl programs. The qw shortcut makes it easy to generate them without typing a lot of extra quote marks:
qw/ fred barney betty wilma dino / # same as above, but less typing
qw stands for "quoted words" or "quoted by whitespace," depending upon whom you ask. Either way, Perl treats it like a single-quoted string (so, you can't use \n or $fred inside a qw list as you would in a double-quoted string). The whitespace (characters like spaces, tabs, and newlines) will be discarded, and whatever is left becomes the list of items. Since whitespace is discarded, here's another (but unusual) way to write that same list:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
List Assignment
In much the same way as scalar values may be assigned to variables, list values may also be assigned to variables:
($fred, $barney, $dino) = ("flintstone", "rubble", undef);
All three variables in the list on the left get new values, just as if we did three separate assignments. Since the list is built up before the assignment starts, this makes it easy to swap two variables' values in Perl:
($fred, $barney) = ($barney, $fred); # swap those values
($betty[0], $betty[1]) = ($betty[1], $betty[0]);
But what happens if the number of variables (on the left side of the equals sign) isn't the same as the number of values (from the right side)? In a list assignment, extra values are silently ignored—Perl figures that if you wanted those values stored somewhere, you would have told it where to store them. Alternatively, if you have too many variables, the extras get the value undef.
($fred, $barney) = qw< flintstone rubble slate granite >; # two ignored items
($wilma, $dino) = qw[flintstone];                         # $dino gets undef
Now that we can assign lists, you could build up an array of strings with a line of code like this:
($rocks[0], $rocks[1], $rocks[2], $rocks[3]) = qw/talc mica feldspar quartz/;
But when you wish to refer to an entire array, Perl has a simpler notation. Just use the at-sign (@) before the name of the array (and no index brackets after it) to refer to the entire array at once. You can read this as "all of the," so @rocks is "all of the rocks." This works on either side of the assignment operator:
@rocks = qw/ bedrock slate lava /;
@tiny = ( );                       # the empty list
@giant = 1..1e5;                  # a list with 100,000 elements
@stuff = (@giant, undef, @giant); # a list with 200,001 elements
$dino = "granite";
@quarry = (@rocks, "crushed rock", @tiny, $dino);
That last assignment gives @quarry the five-element list (bedrock, slate, lava, crushed rock, granite)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Interpolating Arrays into Strings
Like scalars, array values may be interpolated into a double-quoted string. Elements of an array are automatically separated by spaces upon interpolation:
@rocks = qw{ flintstone slate rubble };
print "quartz @rocks limestone\n";  # prints five rocks separated by spaces
There are no extra spaces added before or after an interpolated array; if you want those, you'll have to put them in yourself:
print "Three rocks are: @rocks.\n";
print "There's nothing in the parens (@empty) here.\n";
If you forget that arrays interpolate like this, you'll be surprised when you put an email address into a double-quoted string. For historical reasons, this is a fatal error at compile time:
$email = "fred@bedrock.edu";  # WRONG! Tries to interpolate @bedrock
$email = "fred\@bedrock.edu"; # Correct
$email = 'fred@bedrock.edu';  # Another way to do that
A single element of an array will be replaced by its value, just as you'd expect:
@fred = qw(hello dolly);
$y = 2;
$x = "This is $fred[1]'s place";    # "This is dolly's place"
$x = "This is $fred[$y-1]'s place"; # same thing
Note that the index expression is evaluated as an ordinary expression, as if it were outside a string. It is not variable-interpolated first. In other words, if $y contains the string "2*4", we're still talking about element 1, not element 7, because "2*4" as a number (the value of $y used in a numeric expression) is just plain 2.
If you want to follow a simple scalar variable with a left square bracket, you need to delimit the square bracket so that it isn't considered part of an array reference, as follows:
@fred = qw(eating rocks is wrong);
$fred = "right";               # we are trying to say "this is right[3]"
print "this is $fred[3]\n";    # prints "wrong" using $fred[3]
print "this is ${fred}[3]\n";  # prints "right" (protected by braces)
print "this is $fred"."[3]\n"; # right again (different string)
print "this is $fred\[3]\n";   # right again (backslash hides it)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The foreach Control Structure
It's handy to be able to process an entire array or list, so Perl provides a control structure to do just that. The foreach loop steps through a list of values, executing one iteration (time through the loop) for each value:
foreach $rock (qw/ bedrock slate lava /) {
  print "One rock is $rock.\n";  # Prints names of three rocks
}
The control variable ($rock in that example) takes on a new value from the list for each iteration. The first time through the loop, it's "bedrock"; the third time, it's "lava".
The control variable is not a copy of the list element—it actually is the list element. That is, if you modify the control variable inside the loop, you'll be modifying the element itself, as shown in the following code snippet. This is useful, and supported, but it would surprise you if you weren't expecting it.
@rocks = qw/ bedrock slate lava /;
foreach $rock (@rocks) {
  $rock = "\t$rock";              # put a tab in front of each element of @rocks
  $rock .= "\n";                  # put a newline on the end of each
}
print "The rocks are:\n", @rocks; # Each one is indented, on its own line
What is the value of the control variable after the loop has finished? It's the same as it was before the loop started. The value of the control variable of a foreach loop is automatically saved and restored by Perl. While the loop is running, there's no way to access or alter that saved value. So after the loop is done, the variable has the value it had before the loop, or undef if it hadn't had a value. That means that if you want to name your loop control variable "$rock", you don't have to worry that maybe you've already used that name for another variable.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Perl's Favorite Default: $_
If you omit the control variable from the beginning of the foreach loop, Perl uses its favorite default variable, $_. This is (mostly) just like any other scalar variable, except for its unusual name. For example:
foreach (1..10) {  # Uses $_ by default
  print "I can count to $_!\n";
}
Although this isn't Perl's only default by a long shot, it's Perl's most common default. We'll see many other cases in which Perl will automatically use $_ when you don't tell it to use some other variable or value, thereby saving the programmer from the heavy labor of having to think up and type a new variable name. So as not to keep you in suspense, one of those cases is print, which will print $_ if given no other argument:
$_ = "Yabba dabba doo\n";
print;  # prints $_ by default
The reverse operator takes a list of values (which may come from an array) and returns the list in the opposite order. So if you were disappointed that the range operator, .., only counts upwards, this is the way to fix it:
@fred = 6..10;
@barney = reverse(@fred); # gets 10, 9, 8, 7, 6
@wilma = reverse 6..10;   # gets the same thing, without the other array
@fred = reverse @fred;    # puts the result back into the original array
The last line is noteworthy because it uses @fred twice. Perl always calculates the value being assigned (on the right) before it begins the actual assignment.
Remember that reverse returns the reversed list; it doesn't affect its ar