BUY THIS BOOK
Add to Cart

Print Book $39.95


Add to Cart

Print+PDF $51.94

Add to Cart

PDF $31.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £28.50

What is this?

Looking to Reprint or License this content?


Perl in a Nutshell
Perl in a Nutshell, Second Edition By Stephen Spainhour, Ellen Siever, Nathan Patwardhan
June 2002
Pages: 760

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction to Perl
Computer languages differ not so much in what they make possible, but in what they make easy. Perl is designed to make the easy jobs easy, without making the hard jobs impossible. Perl makes it easy to manipulate numbers, text, files, directories, computers, networks, and programs. It also makes it easy to develop, modify, and debug your own programs portably, on any modern operating system.
Perl is especially popular with systems programmers and web developers, but it also appeals to a much broader audience. Originally designed for text processing, it has grown into a sophisticated, general-purpose programming language with a rich software development environment complete with debuggers, profilers, cross-referencers, compilers, interpreters, libraries, syntax-directed editors, and all the rest of the trappings of a "real" programming language.
There are many reasons for Perl's success. For starters, Perl is freely available and freely redistributable. But that's not enough to explain the Perl phenomenon, since many other freeware packages fail to thrive. Perl is not just free; it's also fun. People feel like they can be creative in Perl, because they have freedom of expression.
Perl is both a very simple language and a very rich language. It's a simple language in that the types and structures are simple to use and understand, and it borrows heavily from other languages you may already be familiar with. You don't have to know everything there is to know about Perl before you can write useful programs.
However, Perl is also a rich language, and there is much to learn about it. That's the price of making hard things possible. Although it will take some time for you to absorb all that Perl can do, somewhere down the line you will be glad that you have access to the extensive capabilities of Perl.
Perl has the advantage of being easy to learn if you just want to write simple scripts—thus its appeal to the ever-impatient system administrator and the deadline-driven CGI developer. However, as you become more ambitious, Perl lets you act on those ambitions. Chapter 2 covers how to get and install Perl, and Chapter 3 through Chapter 6 cover the basics of the Perl language, its functions, and how to use the Perl debugger.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What's Perl Good For?
Perl has the advantage of being easy to learn if you just want to write simple scripts—thus its appeal to the ever-impatient system administrator and the deadline-driven CGI developer. However, as you become more ambitious, Perl lets you act on those ambitions. Chapter 2 covers how to get and install Perl, and Chapter 3 through Chapter 6 cover the basics of the Perl language, its functions, and how to use the Perl debugger.
On top of the Perl language itself, however, are the Perl modules. You can think of modules as add-ons to the Perl language that allow you to streamline tasks by providing a consistent API. Perl itself is fun to use, but the modules lend Perl even more flexibility and enormous power. Furthermore, anyone can write and distribute a Perl module. Some modules are deemed important enough or popular enough to be distributed with Perl itself, but very few are actually written by the core Perl developers themselves. Chapter 7 introduces you to Perl modules, and Chapter 8 covers the standard modules that are distributed with Perl itself.
The most popular Perl module is CGI.pm, which gives a simple interface to developing common gateway interface (CGI) applications in Perl. While Perl itself is indispensable for many different tasks, its text-manipulation features make it perfect for CGI development on the Web. In fact, the resurgence of Perl over the past few years must be credited to its popularity as a CGI language. Chapter 10 and Chapter 11 talk about using Perl for CGI, including mod_perl, which merges Perl into the Apache web server.
Database interconnectivity is one of the most important functions of any programming language today, and Perl is no exception. DBI is a suite of modules that provide a consistent database-independent interface for Perl. Chapter 12 covers both DBI and DBM (the more primitive but surprisingly effective database interface built directly into Perl).
The eXtensible Markup Language (XML) is quickly becoming the de facto way to store electronic information of any kind. Chapter 13 covers the modules designed for Perl and XML processing, and Chapter 14 covers using Perl for managing web services with the XML-based protocol SOAP.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Perl Development
Software doesn't grow on trees. Perl is free because of the donated efforts of several generous people who have devoted large chunks of their spare time to the development, maintenance, and evangelism of Perl.
Perl itself was created by Larry Wall, in an effort to produce reports for a bug-reporting system. Larry designed a new scripting language for this purpose and then released it to the Internet, thinking that someone else might find it useful. In the spirit of freeware, other people suggested improvements and even ways to implement them, and Perl transformed from a cute scripting language into a robust programming language.
Today, Larry does little actual development himself, but helps to guide several other selfless individuals in the continued development and design of the language. Currently, a team of porters are working (with Larry's guidance) on a complete rewrite of the language from the ground up, which is expected to yield a 21st-century version of the language in form and features. This rewrite will be released as Perl Version 6. At the time of this writing, progress on Perl 6 is steady but still a long way from completion.
This second edition of Perl in a Nutshell covers Perl 5.8.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Which Platforms Support Perl?
While Perl was developed on Unix and is closely entwined with Unix culture, it also has a strong following on the Windows and Macintosh platforms. Perl gives Windows 95, Windows NT, Macintosh, and even VMS users the opportunity to take advantage of the scripting power that Unix users take for granted.
Most Unix machines will have Perl already installed, since it's one of the first things a Unix system administrator will build for a new machine (and is in fact distributed with the operating system on some versions of Unix, such as Linux and FreeBSD). For Windows NT, Windows 95, and Macintosh, there are binary distributions of Perl that you can download for free. See Chapter 2 for information on installing Perl.
Although there is some history of other platforms not being treated seriously by the Perl community, Perl is becoming increasingly friendly to non-Unix platforms. The Win32 ports of Perl are quite stable, and as of Perl 5.8, are integrated wholly with core Perl.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Perl Resources
Paradoxically, the way in which Perl helps you the most has almost nothing to do with Perl itself, and everything to do with the people who use Perl. While people start using Perl because they need it, they continue using Perl because they love it.
The result is that the Perl community is one of the most helpful in the world, with CPAN—the Comprehensive Perl Archive Network—as one example. When Perl programmers aren't writing their own programs, they spend their time helping others write theirs. They discuss common problems and help devise solutions. They develop utilities and modules for Perl and give them away to the world at large.
The central meeting place for Perl aficionados is Usenet. If you're not familiar with Usenet, it's a collection of special-interest groups (called newsgroups) on the Internet. For most anyone using a modern browser, Usenet access is as simple as a selecting a menu option on the browser. Perl programmers should consider subscribing to the following newsgroups:
comp.lang.perl.announce
A moderated newsgroup with announcements about new utilities or products related to Perl
comp.lang.perl.misc
The general-purpose newsgroup devoted to non-CGI-related Perl programming questions
comp.lang.perl.moderated
A moderated newsgroup intended to be a forum for more controlled, restrained discussions about Perl
comp.lang.perl.modules
A newsgroup devoted to using and developing Perl modules
comp.lang.perl.tk
A newsgroup concentrating on Perl/Tk, the graphical extension to Perl
comp.infosystems.www.authoring.cgi
A newsgroup for CGI questions in general, but mostly for Perl-related questions
At some point, it seems like every Perl programmer subscribes to comp.lang.perl.misc. You may eventually abandon it if the discussion becomes too detailed, too belligerent, or too bizarre for your taste. But you'll likely find yourself coming back from time to time, either to ask a question or just to check out the latest buzz.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Installing Perl
Some of the best things in life are free. So is Perl. Although bundled Perl distributions are frequently available on CD-ROM, perhaps installed as a core part of your operating system, most people download Perl from an online archive. CPAN, the Comprehensive Perl Archive Network, is the main distribution point for all things Perl .Whether you are looking for Perl itself, for a module, or for documentation about Perl, CPAN (http://www.cpan.org/) is the place to go. The ongoing development and enhancement of Perl is very much a cooperative effort, and CPAN is the place where the work of many individuals comes together.
CPAN represents the development interests of a cross-section of the Perl community. It contains Perl utilities, modules, documentation, and (of course) the Perl distribution itself. CPAN was created by Jarkko Hietaniemi and Andreas König.
The home for CPAN is http://www.cpan.org/, but CPAN is also mirrored on many other sites around the globe. This ensures that anyone with an Internet connection can have reliable access to CPAN's contents at any time. Since the structure of all CPAN sites is the same, a user searching for the current version of Perl can be sure that the stable.tar.gz file is the same on every site.
If you want to use anonymous FTP, the following machines should have the Perl source code plus a copy of the CPAN mirror list:
ftp.perl.com
ftp.cs.colorado.edu
ftp.cise.ufl.edu
ftp.funet.fi
ftp.cs.ruu.nl
The location of the top directory of the CPAN mirror differs on these machines, so look around once you get there. It's often something like /pub/perl/CPAN.
CPAN materials are grouped into categories, including Perl modules, distributions, documentation, announcements, ports, scripts, and contributing authors. Each category is linked to related categories. For example, links to a graphing module written by an author appear in both the module and the author areas.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The CPAN Architecture
CPAN represents the development interests of a cross-section of the Perl community. It contains Perl utilities, modules, documentation, and (of course) the Perl distribution itself. CPAN was created by Jarkko Hietaniemi and Andreas König.
The home for CPAN is http://www.cpan.org/, but CPAN is also mirrored on many other sites around the globe. This ensures that anyone with an Internet connection can have reliable access to CPAN's contents at any time. Since the structure of all CPAN sites is the same, a user searching for the current version of Perl can be sure that the stable.tar.gz file is the same on every site.
If you want to use anonymous FTP, the following machines should have the Perl source code plus a copy of the CPAN mirror list:
ftp.perl.com
ftp.cs.colorado.edu
ftp.cise.ufl.edu
ftp.funet.fi
ftp.cs.ruu.nl
The location of the top directory of the CPAN mirror differs on these machines, so look around once you get there. It's often something like /pub/perl/CPAN.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How Is CPAN Organized?
CPAN materials are grouped into categories, including Perl modules, distributions, documentation, announcements, ports, scripts, and contributing authors. Each category is linked to related categories. For example, links to a graphing module written by an author appear in both the module and the author areas.
Since CPAN provides the same offerings worldwide, the directory structure has been standardized; files are located in the same place in the directory hierarchy at all CPAN sites. All CPAN sites use CPAN as the root directory, from which the user can select a specific Perl item. From the CPAN directory, you have the following choices:
Item
Description
CPAN.html
CPAN info page; some general information about CPAN
ENDINGS
Description of the file extensions, such as .tar, .gz, and .zip
MIRRORED BY
A list of sites mirroring CPAN
MIRRORING.FROM
A list of sites mirrored by CPAN
README
A brief description of what you'll find on CPAN
README.html
An HTML-formatted version of the README file
RECENT
Recent additions to the CPAN site
RECENT.html
An HTML-formatted list of recent additions
ROADMAP
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Installing Perl
Most likely, your system administrator is responsible for installing and upgrading Perl. But if you are the system administrator, or you want to install Perl on your own system, sooner or later you will find yourself installing a new version of Perl.
If you run Perl and plan on upgrading to the latest distribution, be aware that pre-5.005 Perl extensions are not compatible with 5.6 and later. This means that you must rebuild and reinstall any dynamically loaded extensions you built under Perl distributions earlier than 5.005. If you're building under a Unix variant that's running Perl 5.005, choose the Configure option for 5.005 compatibility.
Specific installation instructions come in the README and INSTALL files of the Perl distribution kit. If you don't already have the Perl distribution, you can download it from CPAN—the latest Unix distribution is in stable.tar.gz. The information in this section is an overview of the installation process. The gory details are in the INSTALL file, which you should look at before starting, especially if you haven't done an installation before. Note that operating systems other than Unix may have special instructions; if so, follow those instructions instead of what's in this section or in INSTALL. Look for a file named README.xxx, in which xxx represents your operating-system type.
In addition to Perl itself, the standard distribution includes a set of core modules that are automatically installed with Perl. See Section 2.4 later in this chapter to learn how to install modules that are not bundled with Perl; Chapter 8 describes the standard modules in some detail.
Typically, the Perl kit will be packed as either a tar file or a set of shar (shell archive) scripts; in either case, the file will be in a compressed format. If you got your version of Perl directly from CPAN, it is probably in "tar-gzipped" format; tar and gzip are popular Unix data-archiving formats. In any case, once you've downloaded the distribution, you need to uncompress and unpack it. The filename indicates the kind of compression that was used. A
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Getting and Installing Modules
As you'll see when you look at the lists of modules and their authors on CPAN, many users have made their modules freely available. If you find an interesting problem and are thinking of writing a module to solve it, check the modules directory on CPAN first to see if there is a module there that you can use. The chances are good that there is a module that does what you need, or perhaps one that you can extend, rather than starting from scratch.
Before you download a module, you might also check your system to see if it's already installed. The following command searches the libraries in the @INC array and prints the names of all modules it finds:
find `perl -e 'print "@INC"'` -name '*.pm' -print
If you start from the modules directory on CPAN, you'll see that the modules are categorized into three subdirectories:
by-authors       Modules by author's registered CPAN name
by-category      Modules by subject matter (see below)
by-module        Modules by namespace (i.e., MIME)
               
If you know what module you want, you can go directly to it by clicking on the by-module entry. If you are looking for a module in a particular category, you can find it in the by-category subdirectory. If you know the author, click on by-author. However, if you aren't familiar with the categories and want to find a module that performs a certain task, you might want to get the file 00modlist.long.html, also in the modules directory. This file is the "Perl 5 Modules List." It contains a list of all the modules, by category, with a brief description of the purpose of each module and a link to the author's CPAN directory for downloading.
Here is a list of the Perl Module categories, plus two for modules that don't fit anywhere else:
02_Perl_Core_Modules
03_Development_Support
04_Operating_System_Interfaces
05_Networking_Devices_IPC
06_Data_Type_Utilities
07_Database_Interface
08_User_Interfaces
09_Language_Interfaces
10_File_Names_Systems_Locking
11_String_Lang_Text_Proc
12_Opt_Arg_Param_Proc
13_Internationalization_Locale
14_Security_and_Encryption
15_World_Wide_Web_HTML_HTTP_CGI
16_Server_and_Daemon_Utilities
17_Archiving_and_Compression
18_Images_Pixmaps_Bitmaps
19_Mail_and_Usenet_News
20_Control_Flow_Utilities
21_File_Handle_Input_Output
22_Microsoft_Windows_Modules
23_Miscellaneous_Modules
24_Commercial_Software_Interfaces
99_Not_In_Modulelist
99_Not_Yet_In_Modulelist
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Documentation
Perl documentation is written in a language known as pod (plain old documentation). Pod is a set of simple tags that can be processed to produce documentation in the style of Unix manpages. There are also several utility programs available that process pod text and generate output in different formats. Pod tags can be intermixed with Perl commands or can be saved in a separate file, which usually has a .pod extension. The pod tags and the utility programs that are included in the Perl distribution are described in Chapter 4.
On Unix, the standard Perl installation procedure generates manpages for the Perl documentation from their pod format, although your system administrator might also choose to install the documentation as HTML files. You can also use this procedure to generate manpages for CPAN modules when you install them. You might need to modify your MANPATH environment variable to include the path to the Perl manpages, but then you should be able to read the documentation with the man command. In addition, Perl comes with its own command, perldoc, which formats the pod documentation and displays it. perldoc is particularly useful for reading module documentation, which might not be installed as manpages; you can also use it for reading the core Perl documentation.
The ActiveState Win32 port comes with documentation in HTML format; you can find it in the /docs subdirectory of the distribution. Documentation specific to ActiveState's Perl for Win32 is installed in the /docs/Perl-Win32 subdirectory.
Perl comes with lots of online documentation. To make life easier, the manpages are divided into separate sections so you don't have to wade through hundreds of pages of text to find what you are looking for. You can read them with either the man command or perldoc. Run man perl or perldoc perl to read the top-level page. This page in turn directs you to more specific pages. Or, if you know which page you want, you can go directly there by using:
% man perlvar
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: The Perl Executable
The perl executable is normally installed in /usr/bin or /usr/local/bin on your machine. Some people often refer to perl as the Perl interpreter, but this isn't strictly correct, as you'll learn shortly.
Every Perl program must be passed through the Perl executable to be executed. The first line in many Perl programs is something like:
#!/usr/bin/perl
For Unix systems, this #! (hash-bang or shebang) line tells the shell to look for the /usr/bin/perl program and pass the rest of the file to that /usr/bin/perl for execution. Sometimes, you'll see different pathnames to the Perl executable, such as /usr/local/bin/perl. You might see perl5 or perl6 instead of perl on sites that still depend on older versions of Perl.
Often, you'll see command-line options tacked on the end of perl, such as the notorious -w switch, which produces warning messages. But almost all Perl programs on Unix start with some variation of #!/usr/bin/perl.
If you get a mysterious "Command not found" error on a Perl program, it's often because the path to the Perl executable is wrong. When you download Perl programs off the Internet, copy them from one machine to another, or copy them out of a book (like this one!). The first thing you should do is make sure that the #! line points to the location of the Perl executable on your system. If you're on a Win32 platform, where the shebang path is used only to check for Perl switches, you should make sure that you run pl2bat.bat on the program so you can run it directly from the command line.
So what does the Perl executable do? It compiles the program internally into a parse tree and executes it immediately. Because the program is not compiled and executed in separate steps, Perl is commonly known as an interpreted language, but this is not quite true.
So do you call something a Perl "script" or a Perl "program"? Typically, the word "program" is used to describe something that needs to be compiled into assembler or bytecode before executing, as in the C language. The word "script" is used to describe something that runs through an executable on your system, such as the Bourne shell. For Perl, you can use either phrase and only offend those Perl programmers who care about semantics more than you do.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Command Processing
In addition to specifying a #! line, you can specify a short script directly on the command line. Here are some of the possible ways to run Perl:
  • Issue the perl command, writing your script line by line via -e switches on the command line:
    perl -e 'print "Hello, world\n"'    # Unix
    perl -e "print \"Hello, world\n\""  # Win32 or Unix
    perl -e "print qq[Hello, world\n]"  # Also Win32
  • Issue the perl command, passing Perl the name of your script as the first parameter (after any switches):
    perl testpgm
  • On Unix systems that support the #! notation, specify the Perl command on the #! line, make your script executable, and invoke it from the shell (as described above).
  • Pass your script to Perl via standard input. For example, under Unix:
    echo "print 'Hello, world'" | perl -
    % perl
    print "Hello, world\n";
    ^D
  • On Win32 systems, you can associate an extension (e.g., .plx) with a file type and double-click on the icon for a Perl script with that file type. Or, as mentioned earlier, do this:
    (open a "DOS" window)
    C:\> (edit your Perl program in your favorite editor)
    C:\> pl2bat yourprog.plx
    C:\> .\yourprog.bat
    (program output here)
    If you are using the ActiveState version of Win32 Perl, the installer normally prompts you to create the association.
  • On Win32 systems, if you double-click on the icon for the Perl executable, you'll find yourself in a command-prompt window with a blinking cursor. You can enter your Perl commands, indicating the end of your input with Ctrl-Z, and Perl will compile and execute your script.
Perl parses the input file from the beginning, unless you've specified the -x switch (see Section 3.2 later in this chapter). If there is a #! line, it is always examined for switches as the line is being parsed. Thus, switches behave consistently regardless of how Perl was invoked.
After locating your script, Perl compiles the entire script into an internal form. If there are any compilation errors, execution of the script is not attempted. If the script is syntactically correct, it is executed. If the script runs off the end without hitting an
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Command-Line Options
Perl expects any command-line options, also known as switches or flags , to come first on the command line. The next item is usually the name of the script, followed by any additional arguments (often filenames) to be passed into the script. Some of these additional arguments may be switches, but if so, they must be processed by the script, since Perl gives up parsing switches as soon as it sees either a non-switch item or the special -- switch that terminates switch processing.
A single-character switch with no argument may be combined (bundled) with the switch that follows it, if any. For example:
#!/usr/bin/perl -spi.bak
is the same as:
#!/usr/bin/perl -s -p -i.bak
Perl recognizes the switches listed in Table 3-1.
Table 3-1: Perl switches
Switch
Function
--
Terminates switch processing, even if the next argument starts with a minus. It has no other effect.
-0[octnum]
Specifies the record separator ($/) as an octal number. If octnum is not present, the null character is the separator. Other switches may precede or follow the octal number.
-a
Turns on autosplit mode when used with -n or -p. An implicit split of the @F array is inserted as the first command inside the implicit while loop produced by -n or -p. The default field delimiter is whitespace; a different field delimiter may be specified using -F.
-c
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Environment Variables
Environment variables are used to set user preferences. Individual Perl modules or programs are always free to define their own environment variables, and there is also a set of special environment variables used in the CGI environment (see Chapter 9).
Perl uses the following environment variables:
HOME
Used if chdir has no argument.
LOGDIR
Used if chdir has no argument and HOME is not set.
PATH
Used in executing subprocesses and in finding the script if -S is used.
PATHEXT
On Win32 systems, if you want to avoid typing the extension every time you execute a Perl script, you can set the PATHEXT environment variable so that it includes Perl scripts. For example:
C:\> set PATHEXT=%PATHEXT%;.PLX
This setting lets you type:
C:\> myscript
without including the file extension. Be careful when setting PATHEXT permanently—it also includes executable file types such as .com, .exe, .bat, and .cmd. If you inadvertently lose those extensions, you'll have difficulty invoking applications and script files.
PERL5LIB
A colon-separated list of directories in which to look for Perl library files before looking in the standard library and the current directory. If PERL5LIB is not defined, PERLLIB is used. When running taint checks, neither variable is used. The script should instead say:
use lib "/my/directory";
PERL5OPT
Command-line options (switches). Switches in this variable are taken as if they were on every Perl command line. Only the -[DIMUdmw] switches are allowed. When running taint checks, this variable is ignored.
PERLLIB
A colon-separated list of directories in which to look for Perl library files before looking in the standard library and the current directory. If PERL5LIB is defined, PERLLIB is not used.
PERL5DB
The command used to load the debugger code. The default is:
BEGIN { require 'perl5db.pl' }
PERL5SHELL
On Win32 systems, may be set to an alternative shell for Perl to use internally to execute "backtick" commands or the system function.
PERL_DEBUG_MSTATS
Relevant only if your Perl executable was built with -DDEBUGGING_MSTATS. If set, causes memory statistics to be dumped after execution. If set to an integer greater than 1, it also causes memory statistics to be dumped after compilation.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Perl Compiler
Starting with Perl 5.005, the Perl compiler became part of the standard Perl distribution. You'll find that with Perl 5.6 and later, the Perl compiler has become far more stable. The compiler allows you to distribute Perl programs in binary form, which enables easy packaging of Perl-based programs without relying on the source machine to have the correct version of Perl and the correct modules installed. After the initial compilation, running a compiled program should be faster because it doesn't have to be recompiled each time it's run. However, you shouldn't expect that the compiled code itself will run faster than the original Perl source or that the executable will be smaller—in reality, the executable file is likely to be significantly bigger.
This initial release of the compiler is still considered to be a beta version. It's distributed as an extension module, B, that comes with the following backends:
Bytecode
Translates a script into platform-independent Perl bytecode.
C
Translates a Perl script into C code.
CC
Translates a Perl script into optimized C code.
Deparse
Regenerates Perl source code from a compiled program.
Lint
Extends the Perl -w option. Named after the Unix Lint program-checker.
Showlex
Shows lexical variables used in functions or files.
Xref
Creates a cross-reference listing for a program.
Once you've generated the C code with either the C or the CC backend, you run the cc_harness program to compile it into an executable. There is also a byteperl interpreter that lets you run the code you've generated with the Bytecode backend.
Here's an example that takes a simple "Hello world" program and uses the CC backend to generate C code:
% perl -MO=CC,-ohi.c hi.pl
hi.pl syntax OK
% perl cc_harness -O2 -ohi hi.c  
# You may have to provide the full path of where cc_harness lives
gcc -B/usr/ccs/bin/ -D_REENTRANT -DDEBUGGING -I/usr/local/include 
-I/usr/local/lib/perl5/sun4-solaris-thread/5.00466/CORE -O2 -ohi hi.c 
-L/usr/local/lib /usr/local/lib/perl5/sun4-solaris-thread/5.00466/CORE/libperl.a 
-lsocket -lnsl -lgdbm -ldl -lm -lposix4 -lpthread -lc -lcrypt
% hi
Hi there, world!
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Threads
Perl 5.6 and later also include native multithreading capability, which is distributed with Perl as a set of modules. The threads modules have improved since the 5.005 release, but should still be considered an experimental feature and aren't automatically compiled in with Perl.
Pay close attention to Configure when you build Perl so that you don't include threads support if you don't want it.
You might want to build a separate version of Perl with threads enabled, if you'd like to test the threads feature under your platform.
Chapter 8 describes the individual thread modules. For information on using threads, refer to the perlthrtut manpage (included with more recent distributions of Perl).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: The Perl Language
This chapter is a quick and merciless guide to the Perl language itself. If you're trying to learn Perl from scratch and would prefer to be taught rather than to have things thrown at you, then you might be better off with Learning Perl, 3rd Edition by Randal L. Schwartz and Tom Phoenix. However, if you already know some other programming languages and just want to learn the particulars of Perl, this chapter is for you. Sit tight, and forgive us for being terse—we have a lot of ground to cover.
If you want a more complete discussion of the Perl language and its idiosyncrasies (and we mean complete), see Programming Perl, 3rd Edition by Larry Wall, Tom Christiansen, and Jon Orwant.
Perl is a particularly forgiving language, as far as program layout goes. There are no rules about indentation, newlines, etc. Most lines end with semicolons, but not everything has to. Most things don't have to be declared, except for a couple of things that do. Here are the bare essentials:
Whitespace
Whitespace is required only between items that would otherwise be confused as a single term. All types of whitespace—spaces, tabs, newlines, etc.—are equivalent in this context. A comment counts as whitespace. Different types of whitespace are distinguishable within quoted strings, formats, and certain line-oriented forms of quoting. For example, in a quoted string, a newline, a space, and a tab are interpreted as unique characters.
Semicolons
Every simple statement must end with a semicolon. Compound statements contain brace-delimited blocks of other statements and do not require terminating semicolons after the ending brace. A final simple statement in a block also does not require a semicolon.
Declarations
Only subroutines and report formats need to be explicitly declared. All other user-created objects are automatically created with a null or 0 value unless they are defined by some explicit operation such as assignment. The
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Program Structure
Perl is a particularly forgiving language, as far as program layout goes. There are no rules about indentation, newlines, etc. Most lines end with semicolons, but not everything has to. Most things don't have to be declared, except for a couple of things that do. Here are the bare essentials:
Whitespace
Whitespace is required only between items that would otherwise be confused as a single term. All types of whitespace—spaces, tabs, newlines, etc.—are equivalent in this context. A comment counts as whitespace. Different types of whitespace are distinguishable within quoted strings, formats, and certain line-oriented forms of quoting. For example, in a quoted string, a newline, a space, and a tab are interpreted as unique characters.
Semicolons
Every simple statement must end with a semicolon. Compound statements contain brace-delimited blocks of other statements and do not require terminating semicolons after the ending brace. A final simple statement in a block also does not require a semicolon.
Declarations
Only subroutines and report formats need to be explicitly declared. All other user-created objects are automatically created with a null or 0 value unless they are defined by some explicit operation such as assignment. The -w command-line switch will warn you about using undefined values.
You may force yourself to declare your variables by including the use strict pragma in your programs (see Chapter 8 for more information on pragmas and strict in particular). This causes an error if you do not explicitly declare your variables.
Comments and documentation
Comments within a program are indicated by a pound sign (#). Everything following a pound sign to the end of the line is interpreted as a comment.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Data Types and Variables
Perl has three basic data types: scalars, arrays, and hashes.
Scalars are essentially simple variables. They are preceded by a dollar sign ($). A scalar is either a number, a string, or a reference. (A reference is a scalar that points to another piece of data. References are discussed later in this chapter.) If you provide a string in which a number is expected or vice versa, Perl automatically converts the operand using fairly intuitive rules.
Arrays are ordered lists of scalars accessed with a numeric subscript (subscripts start at 0). They are preceded by an "at" sign (@).
Hashes are unordered sets of key/value pairs accessed with the keys as subscripts. They are preceded by a percent sign (%).
Perl stores numbers internally as either signed integers or double-precision, floating-point values. Numeric literals are specified by any of the following floating-point or integer formats:
12345
Integer
-54321
Negative integer
12345.67
Floating point
6.02E23
Scientific notation
0xffff
Hexadecimal
0377
Octal
4_294_967_296
Underline for legibility
Since Perl uses the comma as a list separator, you cannot use a comma for improving the legibility of a large number. To improve legibility, Perl allows you to use an underscore character instead. The underscore works only within literal numbers specified in your program, not in strings functioning as numbers or in data read from somewhere else. Similarly, the leading 0x for hex and 0 for octal work only for literals. The automatic conversion of a string to a number does not recognize these prefixes—you must do an explicit conversion.
Be aware that in Perl 5.8, there are many changes in how Perl deals with integers and floating-point numbers. Regardless of how your system handles numbers and conversion between characters and numbers, Perl 5.8 works around system deficiencies to force more accurate number handling. Furthermore, whereas prior to 5.8 Perl used floating-point numbers exclusively in math operations, Perl 5.8 now uses and stores integers in numeric conversions and in arithmetic operations.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Statements
A simple statement is an expression evaluated for its side effects. Every simple statement must end in a semicolon, unless it is the final statement in a block.
A sequence of statements that defines a scope is called a block. Generally, a block is delimited by braces, or { }. Compound statements are built out of expressions and blocks. A conditional expression is evaluated to determine whether a statement block will be executed. Compound statements are defined in terms of blocks, not statements, which means that braces are required.
Any block can be given a label. Labels are identifiers that follow the variable-naming rules (i.e., they begin with a letter or underscore and can contain alphanumerics and underscores). They are placed just before the block and are followed by a colon, such as SOMELABEL here:
SOMELABEL: {
  ...statements...
  }
By convention, labels are all uppercase, so as not to conflict with reserved words. Labels are used with the loop control commands next, last, and redo to alter the flow of execution in your programs.
The if and unless statements execute blocks of code depending on whether a condition is met. These statements take the following forms:
if (expression) {block} else {block}

unless (expression) {block} else {block}

if (expression1) {block}
elsif (expression2) {block}
  ...
elsif (lastexpression) {block}
else {block}

Section 4.3.1.1: while loops

The while statement repeatedly executes a block as long as its conditional expression is true. For example:
while (<INFILE>) {
    chomp;
    print OUTFILE, "$_\n";
}
This loop reads each line from the file opened with the filehandle INFILE and prints them to the OUTFILE filehandle. The loop will cease when it encounters an end-of-file.
If the word while is replaced by the word until, the sense of the test is reversed. The conditional is still tested before the first iteration, though.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Special Variables
Some variables have a predefined, special meaning in Perl. They use punctuation characters after the usual variable indicator ($, @, or %), such as $_. The explicit, long-form names are the variables' equivalents when you use the English module by including use English; at the top of your program.
The most common special variable is $_, which contains the default input and pattern-searching string. For example:
foreach ('hickory','dickory','doc') {
        print;
}
The first time the loop is executed, "hickory" is printed. The second time around, "dickory" is printed, and the third time, "doc" is printed. That's because in each iteration of the loop, the current string is placed in $_ and is used by default by print. Here are the places where Perl will assume $_, even if you don't specify it:
  • Various unary functions, including functions such as ord and int, as well as the all file tests (-f, -d), except for -t, which defaults to STDIN.
  • Various list functions such as print and unlink.
  • The pattern-matching operations m//, s///, and tr/// when used without an =~ operator.
  • The default iterator variable in a foreach loop if no other variable is supplied.
  • The implicit iterator variable in the grep and map functions.
  • The default place to put an input record when a line-input operation's result is tested by itself as the sole criterion of a while test (i.e., < filehandle >). Note that outside of a while test, this does not happen.
The following is a complete listing of global special variables:
$_
$ARG
The default input and pattern-searching space.
$.
$INPUT_LINE_NUMBER
$NR
The current input line number of the last filehandle that was read. An explicit close on the filehandle resets the line number.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Operators
Table 4-3 lists all the Perl operators from highest to lowest precedence and indicates their associativity.
Table 4-3: Perl associativity and operators, listed by precedence
Associativity
Operators
Left
Terms and list operators (leftward)
Left
-> (method call, dereference)
Nonassociative
++ -- (autoincrement, autodecrement)
Right
** (exponentiation)
Right
! ~ \ and unary + and - (logical not, bit-not, reference, unary plus, unary minus)
Left
=~ !~ (matches, doesn't match)
Left
* / % x (multiply, divide, modulus, string replicate)
Left
+ - . (addition, subtraction, string concatenation)
Left
<< >> (left bit-shift, right bit-shift)
Nonassociative
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!