BUY THIS BOOK

Safari Books Online

What is this?

Looking to Reprint this content?


Learning Perl
Learning Perl, Fourth Edition

By Randal L. Schwartz, Tom Phoenix, brian d foy

Cover | Table of Contents


Table of Contents

Chapter 1: Introduction
Welcome to the Llama book!
This is the fourth edition of a book that has been enjoyed by half a million readers since 1993. At least, we hope they've enjoyed it. It's a sure thing that we've enjoyed writing it.
You probably have some questions about Perl, and maybe some about this book, especially if you've already flipped through the book to see what's coming. So, we'll use this chapter to answer them.
If you're anything like us, you're probably standing in a bookstore right now, wondering whether you should get this Llama book and learn Perl or maybe that book over there and learn some language named after a snake, or a beverage, or a letter of the alphabet. You've got about two minutes before the bookstore manager comes over to tell you that this isn't a library, and you need to buy something or get out. Maybe you want to use these two minutes to see a quick Perl program, so you'll know something about how powerful Perl is and what it can do. In that case, you should check out the whirlwind tour of Perl later in this chapter.
Thank you for noticing. There are a lot of footnotes in this book. Ignore them. They're needed because Perl is full of exceptions to its rules. This is a good thing, as real life is full of exceptions to rules.
But it means we can't honestly write, "The fizzbin operator frobnicates the hoozistatic variables" without a footnote giving the exceptions. We're pretty honest, so we have to write the footnotes. But you can be honest without reading them. (It's funny how that works out.)
Many of the exceptions have to do with portability. Perl began on Unix systems, and it still has deep roots in Unix. But wherever possible, we've tried to show when something may behave unexpectedly whether the cause is running on a non-Unix system, or some other reason. We hope that readers who know nothing about Unix will find this book a good introduction to Perl. (And they'll learn a little about Unix along the way at no extra charge.)
And many of the other exceptions have to do with the old "80/20" rule. By that, we mean that 80% of the behavior of Perl can be described in 20% of the documentation, and the other 20% of the behavior takes up the other 80% of the documentation. To keep this book small, we'll talk about the most common, easy-to-talk-about behavior in the main text and hint in the direction of the other stuff in the footnotes (which are in a smaller font, so we can say more in the same space). Once you've read the book all the way through without reading the footnotes, you'll probably want to look back at some sections for reference. At that point, or if you become unbearably curious along the way, go ahead and read the notes. A lot of them are just computer jokes anyway.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Questions and Answers
You probably have some questions about Perl, and maybe some about this book, especially if you've already flipped through the book to see what's coming. So, we'll use this chapter to answer them.
If you're anything like us, you're probably standing in a bookstore right now, wondering whether you should get this Llama book and learn Perl or maybe that book over there and learn some language named after a snake, or a beverage, or a letter of the alphabet. You've got about two minutes before the bookstore manager comes over to tell you that this isn't a library, and you need to buy something or get out. Maybe you want to use these two minutes to see a quick Perl program, so you'll know something about how powerful Perl is and what it can do. In that case, you should check out the whirlwind tour of Perl later in this chapter.
Thank you for noticing. There are a lot of footnotes in this book. Ignore them. They're needed because Perl is full of exceptions to its rules. This is a good thing, as real life is full of exceptions to rules.
But it means we can't honestly write, "The fizzbin operator frobnicates the hoozistatic variables" without a footnote giving the exceptions. We're pretty honest, so we have to write the footnotes. But you can be honest without reading them. (It's funny how that works out.)
Many of the exceptions have to do with portability. Perl began on Unix systems, and it still has deep roots in Unix. But wherever possible, we've tried to show when something may behave unexpectedly whether the cause is running on a non-Unix system, or some other reason. We hope that readers who know nothing about Unix will find this book a good introduction to Perl. (And they'll learn a little about Unix along the way at no extra charge.)
And many of the other exceptions have to do with the old "80/20" rule. By that, we mean that 80% of the behavior of Perl can be described in 20% of the documentation, and the other 20% of the behavior takes up the other 80% of the documentation. To keep this book small, we'll talk about the most common, easy-to-talk-about behavior in the main text and hint in the direction of the other stuff in the footnotes (which are in a smaller font, so we can say more in the same space). Once you've read the book all the way through without reading the footnotes, you'll probably want to look back at some sections for reference. At that point, or if you become unbearably curious along the way, go ahead and read the notes. A lot of them are just computer jokes anyway.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Does "Perl" Stand For?
Perl is sometimes called the "Practical Extraction and Report Language" though it has been called a "Pathologically Eclectic Rubbish Lister" among other expansions. It's a retronym, not an acronym since Larry Wall, Perl's creator, came up with the name first and the expansion later. That's why "Perl" isn't in all caps. There's no point in arguing which expansion is correct; Larry endorses both.
You may also see "perl" with a lowercase p in some writing. In general, "Perl" with a capital P refers to the language and "perl" with a lowercase p refers to the interpreter that compiles and runs your programs.
Larry created Perl in the mid-1980s when he wanted to produce some reports from a Usenet news-like hierarchy of files for a bug-reporting system, and awk ran out of steam. Larry, being the lazy programmer that he is, decided to overkill the problem with a general-purpose tool that he could use in at least one other place. The result was Perl Version zero.
There's no shortage of computer languages, is there? But, at the time, Larry didn't see anything that met his needs. If one of the other languages of today had been available back then, perhaps Larry would have used one of those. He needed something with the quickness of coding available in shell or awk programming and with some of the power of more advanced tools like grep, cut, sort, and sed, without having to resort to a language like C.
Perl fills the gap between low-level programming (such as in C or C++ or assembly) and high-level programming (such as "shell" programming). Low-level programming is usually hard to write and is ugly but fast and unlimited; it's hard to beat the speed of a well-written low-level program on a given machine. There, you can do almost anything. High-level programming, at the other extreme, tends to be slow, hard, ugly, and limited; there are many things you can't do with the shell or batch programming if there's no command on your system that provides the needed functionality. Perl is easy, nearly unlimited, mostly fast, and kind of ugly.
Let's take another look at those four claims we made about Perl:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How Can I Get Perl?
You probably already have it. At least, we find Perl wherever we go. It ships with many systems, and system administrators often install it on every machine at their site. If you can't find it on your system, you can get it free.
Perl is distributed under two different licenses. For most people who use Perl, either license is adequate. If you'll be modifying Perl, however, you'll want to read the licenses more closely because of the small restrictions on distributing the modified code. For people who won't modify Perl, the licenses say, "It's free—have fun with it."
So, it's free and runs rather nicely on nearly everything that calls itself Unix and has a C compiler. You download it, type a command or two, and it starts configuring and building itself. Better yet, get your system administrator to type those two commands and install it for you. Besides Unix and Unix-like systems, people have become addicted enough to Perl to port it to other systems, like the Macintosh, VMS, OS/2, MS/DOS, every modern species of Windows, and probably more by the time you read this. Many of these ports of Perl come with an installation program that's easier to use than the process for installing Perl on Unix. Check for links in the "ports" section on CPAN.
CPAN is the Comprehensive Perl Archive Network, your one-stop shopping for Perl. It has the source code for Perl itself, ready-to-install ports of Perl to all sorts of non-Unix systems, examples, documentation, extensions to Perl, and archives of messages about Perl. In short, CPAN is comprehensive.
CPAN is replicated on hundreds of mirror machines around the world. Start at http://search.cpan.org/ or http://kobesearch.cpan.org/ to browse or search the archive. If you don't have access to the Net, you might find a CD-ROM or DVD-ROM with all of the useful parts of CPAN on it. Check with your local technical bookstore. Look for a recently minted archive, though, since CPAN changes daily. An archive from two years ago is an antique. Better yet, get a kind friend with Net access to burn you one with today's CPAN.
Well, you get the complete source, so you get to fix the bugs yourself.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How Do I Make a Perl Program?
It's about time you asked (even if you didn't). Perl programs are text files; you can create and edit them with your favorite text editor. (You don't need any special development environment, though some commercial ones are available from various vendors. We've never used any of these enough to recommend them.)
You should generally use a programmers' text editor, rather than an ordinary editor. What's the difference? Well, a programmers' text editor will let you do things that programmers need, like indent or unindent a block of code or to find the matching closing curly brace for a given opening curly brace. On Unix systems, the two most popular programmers' editors are emacs and vi (and their variants and clones). BBEdit and Alpha are good editors for Mac OS X, and a lot of people have said nice things about UltraEdit and Programmer's Favorite Editor (PFE) on Windows. The perlfaq2 manpage lists several other editors, too. Ask your local expert about text editors on your system.
For the simple programs you'll write for the exercises in this book, none of which should be more than about twenty or thirty lines of code, any text editor will be fine.
A few beginners try to use a word processor instead of a text editor. We recommend against this because it's inconvenient at best and impossible at worst. But we won't try to stop you. Be sure to tell the word processor to save your file as "text only"; the word processor's own format will almost certainly be unusable. Most word processors will probably tell you that your Perl program is spelled incorrectly and should use fewer semicolons.
In some cases, you may need to compose the program on one machine and transfer it to another to run it. If you do this, be sure that the transfer uses "text" or "ASCII" mode and not "binary" mode. This step is needed because of the different text formats on different machines. Without that, you may get inconsistent results. Some versions of Perl abort when they detect a mismatch in the line endings.
According to the oldest rule in the book, any book about a computer language that has Unix-like roots has to start with showing the "Hello, world" program. So, here it is in Perl:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Whirlwind Tour of Perl
So, you want to see a real Perl program with some meat? (If you don't, just play along for now.) Here you are:
    #!/usr/bin/perl
    @lines = `perldoc -u -f atan2`;
    foreach (@lines) {
      s/\w<([^>]+)>/\U$1/g;
      print;
    }
Now, the first time you see Perl code like this, it can seem strange. (In fact, every time you see Perl code like this, it can seem strange.) But let's take it line by line, and see what this example does. (These explanations are brief; this is a whirlwind tour, after all. We'll see all of this program's features in more detail during the rest of this book. You're not really supposed to understand the whole thing until later.)
The first line is the #! line, as you saw before. You might need to change that line for your system, as we discussed earlier.
The second line runs an external command, named within backquotes ("` `"). (The backquote key is often found next to the number 1 on full-sized American keyboards. Be sure not to confuse the backquote with the single quote, "'".) The command we used is perldoc -u -f atan2; type that at your command line to see what its output looks like. The perldoc command is used on most systems to read and display the documentation for Perl and its associated extensions and utilities, so it should normally be available. This command tells you something about the trigonometric function atan2; we're using it here as an example of an external command whose output we wish to process.
The output of that command in the backticks is saved in an array variable called @lines. The next line of code starts a loop that processes each one of those lines. Inside the loop, the statements are indented. Though Perl doesn't require this, good programmers do.
The first line inside the loop body is the scariest one; it says s/\w<([^>]+)>/\U$1/g;. Without going into too much detail, we'll just say that this can change any line that has a special marker made with angle brackets (< >), and there should be at least one of those in the output of the perldoc command.
The next line, in a surprise move, prints out each (possibly modified) line. The resulting output should be similar to what
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Exercises
Normally, each chapter will end with some exercises, with the answers in Appendix A. But you don't need to write the programs needed to complete this section as they are supplied within the chapter text.
If you can't get these exercises to work on your machine, check your work and then consult your local expert. Remember that you may need to tweak each program a little, as described in the text.
  1. [7] Type in the "Hello, world" program and get it to work. (You may name it anything you wish, but a good name might be ex1-1, for simplicity, since it's exercise 1 in Chapter 1.)
  2. [5] Type the command perldoc -u -f atan2 at a command prompt and note its output. If you can't get that to work, then find out from a local administrator or the documentation for your version of Perl about how to invoke perldoc or its equivalent. (You'll need this for the next exercise anyway.)
  3. [6] Type in the second example program (from the previous section) and see what it prints. (Hint: Be careful to type those punctuation marks exactly as shown.) Do you see how it changed the output of the command?
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Scalar Data
In English, as in many other spoken languages, you're used to distinguishing between singular and plural. As a computer language designed by a human linguist, Perl is similar. As a general rule, when Perl has just one of something, that's a scalar. A scalar is the simplest kind of data that Perl manipulates. Most scalars are a number (like 255 or 3.25e20) or a string of characters (like hello or the Gettysburg Address). Though you may think of numbers and strings as different things, Perl uses them nearly interchangeably.
A scalar value can be acted on with operators (such as addition or concatenation), generally yielding a scalar result. A scalar value can be stored into a scalar variable. Scalars can be read from files and devices, and can be written out as well.
Though a scalar is most often either a number or a string, it's useful to look at numbers and strings separately for the moment. We'll cover numbers first and then move on to strings.
As you'll see in the next few paragraphs, you can specify integers (whole numbers, like 255 or 2001) and floating-point numbers (real numbers with decimal points, like 3.14159, or 1.35×1025). But internally, Perl computes with double-precision floating-point values. This means that there are no integer values internal to Perl. An integer constant in the program is treated as the equivalent floating-point value. You probably won't notice the conversion (or care much), but you should stop looking for distinct integer operations (as opposed to floating-point operations) because they don't exist.
A literal is the way a value is represented in the source code of the Perl program. A literal is not the result of a calculation or an I/O operation; it's data written directly into the source code.
Perl's floating-point literals should look familiar to you. Numbers with and without decimal points are allowed (including an optional plus or minus prefix), as well as attaching a power-of-10 indicator (exponential notation) with E notation.
    1.25
    255.000
    255.0
    7.25e45  # 7.25 times 10 to the 45th power (a big number)
    -6.5e24  # negative 6.5 times 10 to the 24th
             # (a big negative number)
    -12e-24  # negative 12 times 10 to the -24th
             # (a very small negative number)
    -1.2E-23 # another way to say that - the E may be uppercase
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Numbers
Though a scalar is most often either a number or a string, it's useful to look at numbers and strings separately for the moment. We'll cover numbers first and then move on to strings.
As you'll see in the next few paragraphs, you can specify integers (whole numbers, like 255 or 2001) and floating-point numbers (real numbers with decimal points, like 3.14159, or 1.35×1025). But internally, Perl computes with double-precision floating-point values. This means that there are no integer values internal to Perl. An integer constant in the program is treated as the equivalent floating-point value. You probably won't notice the conversion (or care much), but you should stop looking for distinct integer operations (as opposed to floating-point operations) because they don't exist.
A literal is the way a value is represented in the source code of the Perl program. A literal is not the result of a calculation or an I/O operation; it's data written directly into the source code.
Perl's floating-point literals should look familiar to you. Numbers with and without decimal points are allowed (including an optional plus or minus prefix), as well as attaching a power-of-10 indicator (exponential notation) with E notation.
    1.25
    255.000
    255.0
    7.25e45  # 7.25 times 10 to the 45th power (a big number)
    -6.5e24  # negative 6.5 times 10 to the 24th
             # (a big negative number)
    -12e-24  # negative 12 times 10 to the -24th
             # (a very small negative number)
    -1.2E-23 # another way to say that - the E may be uppercase
Integer literals are straightforward:
    0
    2001
    -40
    255
    61298040283768
That last one is a little hard to read. Perl allows underscores for clarity within integer literals , so you can also write that number like this:
    61_298_040_283_768
It's the same value but looks different to us human beings. You might have thought that commas should be used for this purpose, but commas are used for a more important purpose in Perl (as you'll see in the next chapter).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Strings
Strings are sequences of characters (like hello). Strings may contain any combination of any characters. The shortest possible string has no characters. The longest string fills all of your available memory, though you wouldn't be able to do much with that. This is in accordance with the principle of "no built-in limits" that Perl follows at every opportunity. Typical strings are printable sequences of letters, digits, and punctuation in the ASCII 32 to ASCII 126 range. However, the ability to have any character in a string means you can create, scan, and manipulate raw binary data as strings and that is something with which many other utilities would have great difficulty. For example, you could update a graphical image or compiled program by reading it into a Perl string, making the change, and writing the result back out.
Like numbers, strings have a literal representation, which is the way you represent the string in a Perl program. Literal strings come in two different flavors: single-quoted string literals and double-quoted string literals.
A single-quoted string literal is a sequence of characters enclosed in single quotes. The single quotes are not part of the string itself but are there to let Perl identify the beginning and the ending of the string. Any character other than a single quote or a backslash between the quote marks (including newline characters, if the string continues onto successive lines) stands for itself inside a string. To get a backslash, put two backslashes in a row; to get a single quote, put a backslash followed by a single quote:
    'fred'    # those four characters: f, r, e, and d
    'barney'  # those six characters
    ''        # the null string (no characters)
    'Don\'t let an apostrophe end this string prematurely!'
    'the last character of this string is a backslash: \\'
    'hello\n' # hello followed by backslash followed by n
    'hello
    there'    # hello, newline, there (11 characters total)
    '\'\\'    # single quote followed by backslash
The \n within a single-quoted string is not interpreted as a newline but as the two characters backslash and
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Perl's Built-in Warnings
Perl can be told to warn you when it sees something suspicious going on in your program. To run your program with warnings turned on, use the -w option on the command line:
    $ perl -w my_program
Or, if you always want warnings, you may request them on the #! line:
    #!/usr/bin/perl -w
That works even on non-Unix systems where it's traditional to write something like this, since the path to Perl doesn't generally matter:
    #!perl -w
With Perl 5.6 and later, you can turn on warnings with a pragma. (Be careful, because it won't work for people with earlier versions of Perl.)
    #!/usr/bin/perl
    use warnings;
Now, Perl will warn you if you use '12fred34' as if it were a number:
    Argument "12fred34" isn't numeric
Of course, warnings are generally meant for programmers and not for end-users. If a programmer doesn't see the warning, it probably won't do any good. And warnings won't change the behavior of your program except that now it will emit gripes once in a while. If you get a warning message you don't understand, you can get a longer description of the problem with the diagnostics pragma. The perldiag manpage has the short warning and the longer diagnostic description.
    #!/usr/bin/perl
    use diagnostics;
When you add the use diagnostics pragma to your program, it may seem to you that your program now pauses for a moment whenever you launch it. That's because your program has to do a lot of work (and gobble a chunk of memory) in case you want to read the documentation as soon as Perl notices your mistakes, if any. This leads to a nifty optimization that can accelerate your program's launch (and memory footprint) with no adverse impact on users, once you no longer need to read the documentation about the warning messages produced by your program, remove the use diagnostics pragma. (It's even better if you fix your program to avoid causing the warnings. But it's sufficient merely to finish reading the output.)
A further optimization can be had by using one of Perl's command-line options, -M, to load the pragma only when needed instead of editing the source code each time to enable and disable
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Scalar Variables
A variable is a name for a container that holds one or more values. The name of the variable stays the same throughout the program, but the value or values contained in that variable typically change repeatedly throughout the execution of the program.
A scalar variable holds a single scalar value as you'd expect. Scalar variable names begin with a dollar sign followed by what we'll call a Perl identifier : a letter or underscore, and then possibly more letters, or digits, or underscores. Another way to think of it is that it's made up of alphanumerics and underscores but can't start with a digit. Uppercase and lowercase letters are distinct: the variable $Fred is a different variable from $fred. And all of the letters, digits, and underscores are significant:
    $a_very_long_variable_that_ends_in_1
The preceding line is different from the following line:
    $a_very_long_variable_that_ends_in_2
Scalar variables in Perl are always referenced with the leading $. In the shell, you use $ to get the value, but leave the $ off to assign a new value. In awk or C, you leave the $ off entirely. If you bounce back and forth a lot, you'll find yourself typing the wrong things occasionally. This is expected. (Most Perl programmers would recommend that you stop writing shell, awk, and C programs, but that may not work for you.)
You should generally select variable names that mean something regarding the purpose of the variable. For example, $r is probably not descriptive but $line_length is. A variable used for only two or three lines close together may be called something like $n, but a variable used throughout a program should probably have a more descriptive name.
Similarly, properly placed underscores can make a name easier to read and understand, especially if your maintenance programmer has a different spoken language background than you have. For example, $super_bowl is a better name than $superbowl since that last one might look like $superb_owl. Does $stopid mean $sto_pid (storing a process-ID of some kind?), $s_to_pid (converting something to a process-ID?), or
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Output with print
It's generally a good idea to have your program produce some output; otherwise, someone may think it didn't do anything. The print() operator makes this possible. It takes a scalar argument and puts it out without any embellishment onto standard output. Unless you've done something odd, this will be your terminal display:
    print "hello world\n"; # say hello world, followed by a newline
     
    print "The answer is ";
    print 6 * 7;
    print ".\n";
You can give print a series of values, separated by commas:
    print "The answer is ", 6 * 7, ".\n";
This is a list, but we haven't talked about lists yet, so we'll put that off for later.
When a string literal is double-quoted, it is subject to variable interpolation besides being checked for backslash escapes. This means that any scalar variable name in the string is replaced with its current value:
    $meal   = "brontosaurus steak";
    $barney = "fred ate a $meal";    # $barney is now "fred ate a brontosaurus steak"
    $barney = 'fred ate a ' . $meal; # another way to write that
As you see on the last line above, you can get the same results without the double quotes. But the double-quoted string is often the more convenient way to write it.
If the scalar variable has never been given a value, the empty string is used instead:
    $barney = "fred ate a $meat"; # $barney is now "fred ate a "
Don't bother with interpolating if you have the one lone variable:
    print "$fred"; # unneeded quote marks
    print $fred;   # better style
There's nothing wrong with putting quote marks around a lone variable, but the other programmers will laugh at you behind your back. Variable interpolation is also known as double-quote interpolation because it happens when double-quote marks (but not single quotes) are used. It happens for some other strings in Perl, which we'll mention as we get to them.
To put a real dollar sign into a double-quoted string, precede the dollar sign with a backslash, which turns off the dollar sign's special significance:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The if Control Structure
Once you can compare two values, you'll probably want your program to make decisions based upon that comparison. Like all similar languages, Perl has an if control structure:
    if ($name gt 'fred') {
      print "'$name' comes after 'fred' in sorted order.\n";
    }
If you need an alternative choice, the else keyword provides that as well:
    if ($name gt 'fred') {
      print "'$name' comes after 'fred' in sorted order.\n";
    } else {
      print "'$name' does not come after 'fred'.\n";
      print "Maybe it's the same string, in fact.\n";
    }
Those block curly braces are required around the conditional code (unlike C, whether you know C or not). It's a good idea to indent the contents of the blocks of code as we show here; that makes it easier to see what's going on. If you're using a programmers' text editor (as discussed in Chapter 1), it'll do most of the work for you.
You may use any scalar value as the conditional of the if control structure. That's handy if you want to store a true or false value into a variable, like this:
    $is_bigger = $name gt 'fred';
    if ($is_bigger) { ... }
But how does Perl decide whether a given value is true or false? Perl doesn't have a separate Boolean data type as some languages have. Instead, it uses a few simple rules:
  • If the value is a number, 0 means false; all other numbers mean true.
  • If the value is a string, the empty string ('') means false; all other strings mean true.
  • If the value is another kind of scalar than a number or a string, convert it to a number or a string and try again.
There's one trick hidden in those rules. Because the string '0' is the same scalar value as the number 0, Perl has to treat them the same. That means that the string '0' is the only nonempty string that is false.
If you need to get the opposite of any Boolean value, use the unary not operator, !. If what follows it is a true value, it returns false; if what follows is false, it returns true:
    if (! $is_bigger) {
      # Do something when $is_bigger is not true
    }
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Getting User Input
At this point, you're probably wondering how to get a value from the keyboard into a Perl program. Here's the simplest way: use the line-input operator, <STDIN>.
Each time you use <STDIN> in a place where a scalar value is expected, Perl reads the next complete text line from standard input (up to the first newline) and uses that string as the value of <STDIN>. Standard input can mean many things; unless you do something uncommon, it means the keyboard of the user who invoked your program (probably you). If there's nothing waiting for <STDIN> to read (typically the case unless you type ahead a complete line), the Perl program will stop and wait for you to enter some characters followed by a newline (return).
The string value of <STDIN> typically has a newline character on the end of it. So, you could do something like this:
    $line = <STDIN>;
    if ($line eq "\n") {
      print "That was just a blank line!\n";
    } else {
      print "That line of input was: $line";
    }
In practice, you don't often want to keep the newline, so you need the chomp operator.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The chomp Operator
The first time you read about the chomp operator, it seems overspecialized. It works on a variable, and the variable has to hold a string. If the string ends in a newline character, chomp can get rid of the newline. That's (nearly) all it does as in this example:
    $text = "a line of text\n"; # Or the same thing from <STDIN>
    chomp($text);               # Gets rid of the newline character
It turns out to be so useful, you'll put it into nearly every program you write. As you see, it's the best way to remove a trailing newline from a string in a variable. In fact, there's an easier way to use chomp because of a simple rule: whenever you need a variable in Perl, you can use an assignment instead. Perl does the assignment and then it uses the variable in whatever way you requested. The most common use of chomp looks like this:
    chomp($text = <STDIN>); # Read the text, without the newline character
     
    $text = <STDIN>;        # Do the same thing...
    chomp($text);           # ...but in two steps
At first glance, the combined chomp may not seem to be the easy way, especially if it seems more complex. If you think of it as two operations, read a line and chomp it, then it's more natural to write it as two statements. If you think of it as one operation, read just the text and not the newline, it's more natural to write the one statement. Since most other Perl programmers are going to write it that way, you may as well get used to it now.
chomp is a function. As a function, it has a return value, which is the number of characters removed. This number is hardly ever useful:
    $food = <STDIN>;
    $betty = chomp $food; # gets the value 1 - but you knew that!
As you see, you may write chomp with or without the parentheses. This is another general rule in Perl: except in cases where it changes the meaning to remove them, parentheses are always optional.
If a line ends with two or more newlines, chomp removes only one. If there's no newline, it does nothing and returns zero.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The while Control Structure
Like most algorithmic programming languages, Perl has a number of looping structures. The while loop repeats a block of code as long as a condition is true:
    $count = 0;
    while ($count < 10) {
      $count += 2;
      print "count is now $count\n"; # Gives values 2 4 6 8 10
    }
As always in Perl, the truth value here works like the truth value in the if test. Like the if control structure, the block curly braces are required. The conditional expression is evaluated before the first iteration, so the loop may be skipped completely if the condition is initially false.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The undef Value
What happens if you use a scalar variable before you give it a value? Nothing serious and definitely nothing fatal. Variables have the special undef value before they are first assigned, which is Perl's way of saying "nothing here to look at—move along, move along." If you use this "nothing" as a "numeric something," it will act like zero. If you use it as a "string something," it will act like the empty string. But undef is neither a number nor a string; it's an entirely separate kind of scalar value.
Because undef automatically acts like zero when used as a number, it's easy to make an numeric accumulator that starts out empty:
    # Add up some odd numbers
    $n = 1;
    while ($n < 10) {
      $sum += $n;
      $n += 2; # On to the next odd number
    }
    print "The total was $sum.\n";
This works properly when $sum was undef before the loop started. The first time through the loop, $n is one, so the first line inside the loop adds one to $sum. That's like adding 1 to a variable that already holds zero because you're using undef as if it were a number. Now it has the value 1. After that, since it's been initialized, adding works in the traditional way.
Similarly, you could have a string accumulator that starts out empty:
    $string .= "more text\n";
If $string is undef, this will act as if it already held the empty string, putting "more text\n" into that variable. But if it holds a string, the new text is appended.
Perl programmers frequently use a new variable in this way, letting it act as zero or the empty string as needed.
Many operators return undef when the arguments are out of range or don't make sense. If you don't do anything special, you'll get a zero or a null string without major consequences. In practice, this is hardly a problem. In fact, most programmers rely upon this behavior. But you should know that when warnings are turned on, Perl will typically warn about unusual uses of the undefined value since that may indicate a bug. For example, copying undef from one variable into another isn't a problem, but trying to print it would generally cause a warning.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The defined Function
One operator that can return undef is the line-input operator, <STDIN>. Normally, it returns a line of text. But if there is no more input, such as at end-of-file, it will return undef to signal this. To tell if a value is undef and not the empty string, use the defined function, which returns false for undef and true for everything else:
    $madonna = <STDIN>;
    if ( defined($madonna) ) {
      print "The input was $madonna";
    } else {
      print "No input available!\n";
    }
If you'd like to make your own undef values, you can use the obscurely named undef operator:
    $madonna = undef; # As if it had never been touched
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Exercises
See Appendix A for answers to the following exercises:
  1. [5] Write a program that computes the circumference of a circle with a radius of 12.5. Circumference is 2π times the radius (approximately 2 times 3.141592654). The answer you get should be about 78.5.
  2. [4] Modify the program from the previous exercise to prompt for and accept a radius from the person running the program. So, if users enter 12.5 for the radius, they should get the same number as in the previous exercise.
  3. [4] Modify the program from the previous exercise so, if the user enters a number less than zero, the reported circumference will be zero, rather than negative.
  4. [8] Write a program that prompts for and reads two numbers (on separate lines of input) and prints out the product of the two numbers multiplied together.
  5. [8] Write a program that prompts for and reads a string and a number (on separate lines of input) and prints out the string the number of times indicated by the number on separate lines. (Hint: Use the "x" operator.) If the user enters "fred" and "3," the output should be three lines, each saying "fred". If the user enters "fred" and "299792," there may be a lot of output.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Lists and Arrays
If a scalar is the "singular" in Perl, as we described it at the beginning of Chapter 2, the "plural" in Perl is represented by lists and arrays.
A list is an ordered collection of scalars. An array is a variable that contains a list. In Perl, the two terms are often used as if they're interchangeable. But, to be accurate, the list is the data, and the array is the variable. You can have a list value that isn't in an array, but every array variable holds a list, though that list may be empty. Figure 3-1 represents a list, whether it's stored in an array or not.
Figure 3-1: A list with five elements
Each element of an array or list is a separate scalar variable with an independent scalar value. These values are ordered, that is, they have a particular sequence from the first to the last element. The elements of an array or list are indexed by small integers starting at zero and counting by ones, so the first element of any array or list is always element zero.
Since each element is an independent scalar value, a list or array may hold numbers, strings, undef values, or any mixture of different scalar values. Nevertheless, it's most common to have all elements of the same type, such as a list of book titles (all strings) or a list of cosines (all numbers).
Arrays and lists can have any number of elements. The smallest one has no elements, and the largest can fill all of the available memory. Once again, this is in keeping with Perl's philosophy of "no unnecessary limits."
If you've used arrays in another language, you won't be surprised to find Perl provides a way to subscript an array to refer to an element by a numeric index.
The array elements are numbered using sequential integers, beginning at zero and increasing by one for each element, like this:
    $fred[0] = "yabba";
    $fred[1] = "dabba";
    $fred[2] = "doo";
The array name (in this case, "fred") is from a completely separate namespace than scalars use. You could have a scalar variable named $fred in the same program. Perl treats them as different things and doesn't get confused. (Your maintenance programmer might be confused though, so don't capriciously make all of your variable names the same.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Accessing Elements of an Array
If you've used arrays in another language, you won't be surprised to find Perl provides a way to subscript an array to refer to an element by a numeric index.
The array elements are numbered using sequential integers, beginning at zero and increasing by one for each element, like this:
    $fred[0] = "yabba";
    $fred[1] = "dabba";
    $fred[2] = "doo";
The array name (in this case, "fred") is from a completely separate namespace than scalars use. You could have a scalar variable named $fred in the same program. Perl treats them as different things and doesn't get confused. (Your maintenance programmer might be confused though, so don't capriciously make all of your variable names the same.)
You can use an array element like $fred[2] in every place where you could use any other scalar variable like $fred. For example, you can get the value from an array element or change that value by the same sorts of expressions we used in the previous chapter:
    print $fred[0];
    $fred[2]  = "diddley";
    $fred[1] .= "whatsis";
Of course, the subscript may be any expression that gives a numeric value. If it's not an integer, it'll automatically be truncated to the next lower integer:
    $number = 2.71828;
    print $fred[$number - 1]; # Same as printing $fred[1]
If the subscript indicates an element that would be beyond the end of the array, the corresponding value will be undef. This is the same as ordinary scalars; if you've never stored a value into the variable, it's undef.
    $blank = $fred[ 142_857 ]; # unused array element gives undef
    $blanc = $mel;             # unused scalar $mel also gives undef
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Special Array Indices
If you store in an array an element that is beyond the end of the array, the array is automatically extended as needed. There's no limit on its length as long as there's available memory for Perl to use. If Perl needs to create the intervening elements, it creates them as undef values.
    $rocks[0]  = 'bedrock';      # One element...
    $rocks[1]  = 'slate';        # another...
    $rocks[2]  = 'lava';         # and another...
    $rocks[3]  = 'crushed rock'; # and another...
    $rocks[99] = 'schist';       # now there are 95 undef elements
Sometimes you need to find out the last element index in an array. For the array of rocks that we've been using, the last element index is $#rocks. That's not the same as the number of elements because there's an element number zero.
    $end = $#rocks;                  # 99, which is the last element's index
    $number_of_rocks = $end + 1;     # okay, but you'll see a better way later
    $rocks[ $#rocks ] = 'hard rock'; # the last rock
Using the $#name value as an index, like that last example, happens often enough that Larry has provided a shortcut: negative array indices count from the end of the array. But don't get the idea that these indices "wrap around." If you've got three elements in the array, the valid negative indices are -1 (the last element), -2 (the middle element), and -3 (the first element). In the real world, nobody seems to use any of these except -1, though.
    $rocks[ -1 ]   = 'hard rock';   # easier way to do that last example
    $dead_rock     = $rocks[-100];  # gets 'bedrock'
    $rocks[ -200 ] = 'crystal';     # fatal error!
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
List Literals
An array (the way you represent a list value within your program) is a list of comma-separated values enclosed in parentheses. These values form the elements of the list:
    (1, 2, 3)      # list of three values 1, 2, and 3
    (1, 2, 3,)     # the same three values (the trailing comma is ignored)
    ("fred", 4.5)  # two values, "fred" and 4.5

    ()            # empty list - zero elements
    (1..100)       # list of 100 integers
That last one uses the .. range operator, seen here for the first time, which creates a list of values by counting from the left scalar up to the right scalar by ones:
    (1..5)            # same as (1, 2, 3, 4, 5)
    (1.7..5.7)        # same thing - both values are truncated
    (5..1)            # empty list - .. only counts "uphill"
    (0, 2..6, 10, 12) # same as (0, 2, 3, 4, 5, 6, 10, 12)
    ($m..$n)          # range determined by current values of $m and $n
    (0..$#rocks)      # the indices of the rocks array from the previous section
As you can see from those last two items, the elements of a list literal are not necessarily constants—they can be expressions that will be newly evaluated each time the literal is used:
    ($m, 17)       # two values: the current value of $m, and 17
    ($m+$o, $p+$q) # two values
Of course, a list may have any scalar values, like this typical list of strings:
    ("fred", "barney", "betty", "wilma", "dino")
It turns out that lists of simple words (like the previous example) are frequently needed in Perl programs. The qw shortcut makes it easy to generate them without typing a lot of extra quote marks:
    qw( fred barney betty wilma dino ) # same as above, but less typing
qw stands for "quoted words" or "quoted by whitespace," depending upon whom you ask. Either way, Perl treats it like a single-quoted string so, you can't use \n or $fred inside a qw list as you would in a double-quoted string. The whitespace (characters like spaces, tabs, and newlines) will be discarded, and whatever remains becomes the list of items. Since whitespace is discarded, here's another (but unusual) way to write that same list:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
List Assignment
In much the same way as scalar values, list values may be assigned to variables:
    ($fred, $barney, $dino) = ("flintstone", "rubble", undef);
All three variables in the list on the left get new values, as if you did three separate assignments. Since the list is built up before the assignment starts, this makes it easy to swap two variables' values in Perl:
    ($fred, $barney) = ($barney, $fred); # swap those values
     ($betty[0], $betty[1]) = ($betty[1], $betty[0]);
But what happens if the number of variables (on the left side of the equals sign) isn't the same as the number of values (from the right side)? In a list assignment, extra values are silently ignored. Perl figures that if you wanted those values stored somewhere, you would have told it where to store them. Alternatively, if you have too many variables, the extras get the value undef.
    ($fred, $barney) = qw< flintstone rubble slate granite >; # two ignored items
    ($wilma, $dino)  = qw[flintstone];                        # $dino gets undef
Now that you can assign lists, you could build up an array of strings with a line of code like this:
    ($rocks[0], $rocks[1], $rocks[2], $rocks[3]) = qw/talc mica feldspar quartz/;
But when you wish to refer to an entire array, Perl has a simpler notation. Just use the at sign (@) before the name of the array (and no index brackets after it) to refer to the entire array at once. You can read this as "all of the," so @rocks is "all of the rocks." This works on either side of the assignment operator:
    @rocks  = qw/ bedrock slate lava /;
    @tiny   = ();                       # the empty list
    @giant  = 1..1e5;                    # a list with 100,000 elements
    @stuff  = (@giant, undef, @giant);   # a list with 200,001 elements
    $dino   = "granite";
    @quarry = (@rocks, "crushed rock", @tiny, $dino);
That last assignment gives @quarry the five-element list (bedrock, slate, lava, crushed rock, granite) since @tiny contributes zero elements to the list. (In particular, it doesn't put an undef item into the list, but you could do that explicitly as we did with
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Interpolating Arrays into Strings
Like scalars, array values may be interpolated into a double-quoted string. Elements of an array are automatically separated by spaces upon interpolation:
    @rocks = qw{ flintstone slate rubble };
    print "quartz @rocks limestone\n";  # prints five rocks separated by spaces
There are no extra spaces added before or after an interpolated array; if you want those, you'll have to put them in yourself:
    print "Three rocks are: @rocks.\n";
    print "There's nothing in the parens (@empty) here.\n";
If you forget that arrays interpolate like this, you'll be surprised when you put an email address into a double-quoted string. For historical reasons, this is a fatal error at compile time:
    $email = "fred@bedrock.edu";  # WRONG! Tries to interpolate @bedrock
    $email = "fred\@bedrock.edu"; # Correct
    $email = 'fred@bedrock.edu';  # Another way to do that
However, in versions of Perl 5 soon to be released as we write this, the behavior of an unseen array variable will become similar to an unseen scalar variable, i.e., replaced with an empty string with a warning if warnings are enabled. The Perl developers apparently figure that 10 years of fatality are enough warning.
A single element of an array will be replaced by its value as you'd expect:
    @fred = qw(hello dolly);
    $y = 2;
    $x = "This is $fred[1]'s place";    # "This is dolly's place"
    $x = "This is $fred[$y-1]'s place"; # same thing
The index expression is evaluated as an ordinary expression, as if it were outside a string. It is not variable interpolated first. In other words, if $y contains the string "2*4", we're still talking about element 1, not element 7, because "2*4" as a number (the value of $y used in a numeric expression) is just plain 2. If you want to follow a simple scalar variable with a left square bracket, you need to delimit the square bracket so it isn't considered part of an array reference:
    @fred = qw(eating rocks is wrong);
    $fred = "right";               # we are trying to say "this is right[3]"
    print "this is $fred[3]\n";    # prints "wrong" using $fred[3]
    print "this is ${fred}[3]\n";  # prints "right" (protected by braces)
    print "this is $fred"."[3]\n"; # right again (different string)
    print "this is $fred\[3]\n";   # right again (backslash hides it)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The foreach Control Structure
It's handy to be able to process an entire array or list, so Perl provides a control structure to do that. The foreach loop steps through a list of values, executing one iteration (time through the loop) for each value:
    foreach $rock (qw/ bedrock slate lava /) {
      print "One rock is $rock.\n";  # Prints names of three rocks
    }
The control variable ($rock in that example) takes on a new value from the list for each iteration. The first time through the loop, it's "bedrock"; the third time, it's "lava".
The control variable is not a copy of the list element—it actually is the list element. That is, if you modify the control variable inside the loop, you'll be modifying the element in the original list, as shown in the following code snippet. This is useful and supported, but it would surprise you if you weren't expecting it.
    @rocks = qw/ bedrock slate lava /;
    foreach $rock (@rocks) {
      $rock = "\t$rock";       # put a tab in front of each element of @rocks
      $rock .= "\n";           # put a newline on the end of each

    }
    print "The rocks are:\n", @rocks; # Each one is indented, on its own line
What is the value of $rock after the loop has finished? It's the same as it was before the loop started. The value of the control variable of a foreach loop is automatically saved and restored by Perl. While the loop is running, there's no way to access or alter that saved value. So after the loop is done, the variable has the value it had before the loop or undef if it didn't have a value. That means that if you want to name your loop control variable "$rock", you don't have to worry that maybe you've used that name for another variable.
If you omit the control variable from the beginning of the foreach loop, Perl uses its favorite default variable, $_. This is (mostly) like any other scalar variable, except for its unusual name, as in this example:
    foreach (1..10) {  # Uses $_ by default
      print "I can count to $_!\n";
    }
Though this isn't Perl's only default by a long shot, it's Perl's most common default. You'll see many other cases in which Perl automatically uses
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Scalar and List Context
This is the most important section in this chapter. In fact, it's the most important section in the entire book. It wouldn't be an exaggeration to say that your entire career in using Perl will depend upon understanding this section. If you've gotten away with skimming the text up to this point, this is where you should pay attention.
That's not to say that this section is in difficult to understand. It's a simple idea: a given expression may mean different things depending upon where it appears. This is nothing new; it happens all the time in natural languages. For example, in English, suppose someone asked you what the word "read" means. It has different meanings depending on how it's used. You can't identify the meaning until you know the context.
The context refers to where an expression is found. As Perl is parsing your expressions, it always expects a scalar or list value. What Perl expects is called the context of the expression.
    42 + something # The something must be a scalar
    sort something # The something must be a list
If something is the exact same sequence of characters, in one case it may give a single, scalar value, and in another, it may give a list. Expressions in Perl always return the appropriate value for their context. For example, how about the "name" of an array. In a list context, it gives the list of elements. But in a scalar context, it returns the number of elements in the array:
    @people = qw( fred barney betty );
    @sorted = sort @people; # list context: barney, betty, fred
    $number = 42 + @people;  # scalar context: 42 + 3 gives 45
Even ordinary assignment (to a scalar or a list) causes different contexts:
    @list = @people; # a list of three people
    $n = @people;    # the number 3
Don't jump to the conclusion that scalar context always gives the number of elements that would have been returned in list context. Most list-producing expressions return something more interesting than that.
There are many expressions that would typically be used to produce a list. If you use one in a scalar context, what do you get? See what the author of that operation says about it. Usually, that person is Larry, and usually the documentation gives the whole story. A big part of learning Perl is learning how Larry thinks. Therefore, once you can think like Larry does, you know what Perl should do. But while you're learning, you'll probably need to look into the documentation.
Additional content appearing in this section has been removed.
Purchase this book now or