Filehandles

Unless you’re using artificial intelligence to model a solipsistic philosopher, your program needs some way to communicate with the outside world. In lines 4 and 8 of our Average Example you’ll see the word GRADES, which exemplifies another of Perl’s data types, the filehandle. A filehandle is just a name you give to a file, device, socket, or pipe to help you remember which one you’re talking about, and to hide some of the complexities of buffering and such. (Internally, filehandles are similar to streams from a language like C++ or I/O channels from BASIC.)

Filehandles make it easier for you to get input from and send output to many different places. Part of what makes Perl a good glue language is that it can talk to many files and processes at once. Having nice symbolic names for various external objects is just part of being a good glue language.[22]

You create a filehandle and attach it to a file by using open. The open function takes at least two parameters: the filehandle and filename you want to associate it with. Perl also gives you some predefined (and preopened) filehandles. STDIN is your program’s normal input channel, while STDOUT is your program’s normal output channel. And STDERR is an additional output channel that allows your program to make snide remarks off to the side while it transforms (or attempts to transform) your input into your output.[23] In lines 4 and 5 of our program, we also tell our new GRADES filehandle and the existing STDOUT filehandle to assume that text is encoded in UTF-8, a common representation of Unicode text.

Since you can use the open function to create filehandles for various purposes (input, output, piping), you need to be able to specify which behavior you want. As you might do on the command line, you can simply add characters to the filename:

open(SESAME,    "filename")            # read from existing file
open(SESAME, "<  filename")            #   (same thing, explicitly)
open(SESAME, ">  filename")            # create file and write to it
open(SESAME, ">> filename")            # append to existing file
open(SESAME, "| output–pipe–command")  # set up an output filter
open(SESAME, "input–pipe–command |")   # set up an input filter

However, the recommended three-argument form of open allows you to specify the open mode in an argument separate from the filename itself. This is useful when you’re dealing with filenames that aren’t literals and so might already contain characters that look like open modes or significant whitespace.

open(SESAME, "<",  $somefile)          # read from existing file
open(SESAME, ">",  $somefile)          # create file and write to it
open(SESAME, ">>", $somefile)          # append to existing file
open(SESAME, "|–", "output–pipe–command")  # set up an output filter
open(SESAME, "–|", "input–pipe–command")   # set up an input filter

As we did in our program, this form of open also lets you specify the character encoding of the file.

open(SESAME, "< :encoding(UTF–8)",     $somefile)
open(SESAME, "> :crlf",                $somefile)
open(SESAME, ">> :encoding(MacRoman)", $somefile)

As you can see, the name you pick for the filehandle is arbitrary. Once opened, the filehandle SESAME can be used to access the file or pipe until it is explicitly closed (with, you guessed it, close(SESAME)), or until the filehandle is attached to another file by a subsequent open on the same filehandle. Opening an already opened filehandle implicitly closes the first file, making it inaccessible to the filehandle, and opens a different file. You must be careful that this is what you really want to do. Sometimes it happens accidentally, like when you say open($handle,$file), and $handle happens to contain a constant string. Be sure to set $handle to something unique, or you’ll just open a new file on the same filehandle.

A much better idea is to leave $handle undefined, letting Perl fill it in for you. This is handy for when you get tired of choosing your own names for filehandles: if you pass open an undefined variable (such as my creates), Perl will pick the filehandle for you and fill it in automatically:

open(my $handle, "< :crlf :encoding(cp1252)", $somefile)
  || die "can't open $somefile: $!";

If the open succeeds, the $handle variable is now defined, and you can use it wherever a filehandle is expected.

Once you’ve opened a filehandle for input, you can read a line using the line reading operator, <>. This is also known as the angle operator because it’s made of angle brackets. The angle operator encloses the filehandle (<SESAME> if a literal handle, and <$handle> for an indirect one) you want to read lines from. The empty angle operator, <>, will read lines from all the files specified on the command line, or STDIN if no arguments were specified. (This is standard behavior for many filter programs.) An example using the STDIN filehandle to read an answer supplied by the user would look something like this:

print STDOUT "Enter a number: ";      # ask for a number
$number = <STDIN>;                    # input the number
say STDOUT "The number is $number.";  # print the number

Did you see what we just slipped by you? What’s that STDOUT doing there in those print and say statements? Well, that’s just one of the ways you can use an output filehandle. A filehandle may be supplied between the command and its argument list, and if present, tells the output where to go. In this case, the filehandle is redundant because the output would have gone to STDOUT anyway. Much as STDIN is the default for input, STDOUT is the default for output. (In line 22 of our Average Example, we left it out to avoid confusing you until now.)

If you try the previous example, you may notice that you get an extra blank line. This happens because the line-reading operation does not automatically remove the newline from your input line (your input would be, for example, "9\n"). For those times when you do want to remove the newline, Perl provides the chop and chomp functions. chop will indiscriminately remove (and return) the last character of the string, while chomp will only remove the end of record marker (generally, "\n") and return the number of characters so removed. You’ll often see this idiom for inputting a single line:

chomp($number = <STDIN>);    # input a number, then remove its newline

which means the same thing as:

$number = <STDIN>;           # input a number
chomp($number);              # remove trailing newline

One last thing, just because we called our variable $number doesn’t mean it was one. Any string will do. Perl only cares whether something is a number if you try to operate on that string as though it were a number—down which road lie operators, our next topic.



[22] Some of the other things that make Perl a good glue language are: it handles non-ASCII data, it’s embeddable, and you can embed other things in it via extension modules. It’s concise, and it “networks” easily. It’s environmentally conscious, so to speak. You can invoke it in many different ways (as we saw earlier). But most of all, the language itself is not so rigidly structured that you can’t get it to “flow” around your problem. It comes back to that TMTOWTDI thing again.

[23] These filehandles are typically attached to your terminal, so you can type to your program and see its output, but they may also be attached to files (and such). Perl can give you these predefined handles because your operating system already provides them, one way or another. Under Unix, processes inherit standard input, output, and error from their parent process, typically a shell. One of the duties of a shell is to set up these I/O streams so that the child process doesn’t need to worry about them.

Get Programming Perl, 4th Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.