Chapter 11. Configuring Perl Programs

Once someone figures out that you know Perl, they’ll probably ask you to write a program for them or even change one of the programs that you have. Someone else finds out about your nifty little program and they want to use it too, but in a slightly different way.

Don’t get trapped into creating or maintaining multiple versions of your program. Make them configurable, and do it so your users don’t have to touch the code. When users touch the code, all sorts of things go wrong. Their little change breaks the program, perhaps because they forget a semicolon. Who do they come to for a fix? That’s right—they come to you. A little work making your program configurable saves you headaches later.

Things Not to Do

The easiest, and worst, way to configure my Perl program is simply to put a bunch of variables in it and tell the user to change them if they need something different. The user then has to open my program and change the values to change the behavior of my program. This gives the user the confidence to change other things, too, despite my warning to not change anything past the configuration section. Even if the user stays within the section where I intend her to edit code, she might make a syntax error. Not only that—if she has to install this program on several machines, she’ll end up with a different version for each machine. Any change or update in the program requires her to edit every version:

#!/usr/bin/perl
use strict;
use warnings;

my $Debug   = 0;
my $Verbose = 1;
my $Email   = 'alice@example.com';
my $DB      = 'DBI:mysql';

#### DON'T EDIT BEYOND THIS LINE !!! ###

I really don’t want my users to think about what the program is; they just need to know what it does and how they can interact with it. I don’t really care if they know which language I used, how it works, and so on. I want them to get work done, which really means I don’t want them to have to ask me for help. I also don’t want them to look inside code because I don’t expect them even to know Perl. They can still look at the code (we do like open source, after all), but they don’t need to if I’ve done my job well.

Now that I’ve said all that, sometimes hardcoding values really isn’t all that bad, although I wouldn’t really call this next method “configuration.” When I want to give a datum a name that I can reuse, I pull out the constant pragma, which creates a subroutine that simply returns the value. I define PI as a constant and then use it as a bareword where I need it:

use constant PI => 3.14159;

my $radius = 1;
my $circumference = 2 * PI * $radius;

This is a more readable way of defining my own subroutine to do it because it shows my intent to make a constant. I use an empty prototype so Perl doesn’t try to grab anything after the subroutine name as an argument. I can use this subroutine anywhere in the program, just as I can use any other subroutine. I can export them from modules or access them by their full package specification:

sub PI () { 3.14159 }

This can be handy to figure out some value and provide easy access to it. Although I don’t do much in this next example, I could have accessed a database, downloaded something over the network, or anything else I might need to do to compute the value:

{
my $days_per_year = $ENV{DAYS_PER_YEAR} || 365.24;
my $secs_per_year = 60 * 60 * 24 * $days_per_year;

sub SECS_PER_YEAR { $secs_per_year }
}

Curiously, the two numbers PI and SECS_PER_YEAR are almost the same, aside from a factor of 10 million. The seconds per year (ignoring partial days) is about 3.15e7, which is pretty close to Pi times 10 million if I’m doing calculations on the back of a pub napkin.

Similarly, I can use the Readonly module if I feel more comfortable with Perl variables. If I attempt to modify any of these variables, Perl gives me a warning. This module allows me to create lexical variables, too:

use Readonly;

Readonly::Scalar my $Pi        => 3.14159;
Readonly::Array  my @Fibonacci => qw( 1 1 2 3 5 8 13 21 );

Readonly::Hash   my %Natural   => ( e => 2.72, Pi => 3.14, Phi => 1.618 );

With Perl 5.8 or later, I can leave off the second-level package name and let Perl figure it out based on the values that I give it:

use 5.8;
use Readonly;

Readonly my $Pi        => 3.14159;

Readonly my @Fibonacci => qw(1 1 2 3 5 8 13 21 );

Readonly my %Natural   => ( e => 2.72, Pi => 3.14, Phi => 1.618 );

Code in a Separate File

A bit more sophisticated although still not good, that same configuration can be placed in a separate file and pulled into the main program. In config.pl I put the code I previously had at the top of my program. I can’t use lexical variables because those are scoped to their file. Nothing outside config.pl can see them, which isn’t what I want for a configuration file:

# config.pl
use vars qw( $Debug $Verbose $Email $DB );

$Debug   = 0;
$Verbose = 1;
$Email   = 'alice@example.com';
$DB      = 'DBI:mysql';

I pull in the configuration information with require, but I have to do it inside a BEGIN block so Perl sees the use vars declaration before it compiles the rest of my program. We covered this in more detail in Intermediate Perl, Chapter 3, when we started to talk about modules:

#!/usr/bin/perl
use strict;
use warnings;

BEGIN { require "config.pl"; }

Of course, I don’t have to go through these shenanigans if I don’t mind getting rid of use strict, but I don’t want to do that. That doesn’t stop other people from doing that though, and Google[45]finds plenty of examples of config.pl.

Better Ways

Configuration is about separating from the rest of the code the information that I want the user to be able to change. These data can come from several sources, although it’s up to me to figure out which source makes sense for my application. Not every situation necessarily needs the same approach.

Environment Variables

Environment variables set values that every process within a shell can access and use. Subprocesses can see these same values, but they can’t change them for other processes above them. Most shells set some environment variables automatically, such as HOME for my home directory and PWD for the directory I’m working in. In Perl, these show up in the %ENV hash. On most machines, I write a testenv program to see how things are set up:

#!/usr/bin/perl

print "Content-type: text/plain\n\n" if $ENV{REQUEST_METHOD};

foreach my $key ( sort keys %ENV )
        {
        printf "%-20s %s\n", $key, $ENV{$key};
        }

Notice the line that uses $ENV{REQUEST_METHOD}. If I use my program as a CGI program, the web server sets several environment variables including one called REQUEST_METHOD. If my program sees that it’s a CGI program, it prints a CGI response header. Otherwise, it figures I must be at a terminal and skips that part.

I particularly like using environment variables in CGI programs because I can set the environment in an .htaccess file. This example is Apache-specific and requires mod_env, but other servers may have similar facilities:

# Apache .htaccess
SetEnv DB_NAME mousedb
SetEnv DB_USER buster
SetEnv DB_PASS pitrpat

Any variables that I set in .htaccess show up in my program and are available to all programs affected by that file. If I change the password, I only have to change it in one place. Beware, though, since the web server user can read this file, other users may be able to get this information. Almost any way you slice it, though, eventually the web server has to know these values, so I can’t keep them hidden forever.

Special Environment Variables

Perl uses several environment variables to do its work. The PERL5OPT environment variable simulates me using those switches on the command line, and the PERL5LIB environment variable adds directories to the module search path. That way, I can change how Perl acts without changing the program.

To add more options just as if I had specified them on the command line of the shebang line, I add them to PERL5OPT. This can be especially handy if I always want to run with warnings, for instance:

% export PERL5OPT=w

The PERL5LIB value stands in place of the use lib directives in the code. I often have to use this when I want to run the same programs on different computers. As much as I’d like all of the world to have the same filesystem layout and to store modules, home directories, and other files in the same place, I haven’t had much luck convincing anyone to do it. Instead of editing the program to change the path to the local modules, I set it externally. Once set in a login program or Makefile, it’s there and I don’t have to think about it. I don’t have to edit all of my programs to have them find my new Perl library directory:

% export PERL5LIB=/Users/brian/lib/perl5

Turning on Extra Output

While developing, I usually add a lot of extra print statements so I can inspect the state of the program as I’m tracking down some bug. As I get closer to a working program, I leave these statements in there, but I don’t need them to execute every time I run the program; I just want them to run when I have a problem.

Similarly, in some instances I want my programs to show me normal output as it goes about its work when I’m at the terminal but be quiet when run from cron, a shell program, and so on.

In either case, I could define an environment variable to switch on, or switch off, the behavior. With an environment variable, I don’t have to edit the use of the program in other programs. My changes can last for as little as a single use by setting the environment variable when I run the program:

$ DEBUG=1 ./program.pl

or for the rest of the session when I set the environment variable for the entire session:

$ export DEBUG=1
$ ./program.pl

Now I can use these variables to configure my program. Instead of coding the value directly in the program, I get it from the environment variables:

#!/usr/bin/perl
use strict;
use warnings;

my $Debug   = $ENV{DEBUG};
my $Verbose = $ENV{VERBOSE};

...

print "Starting processing\n" if $Verbose;

...

warn "Stopping program unexpectedly" if $Debug;

I can set environment variables directly on the command line and that variable applies only to that process. I can use my testenv program to verify the value. Sometimes I make odd shell mistakes with quoting and special character interpolation so testenv comes in handy when I need to figure out why the value isn’t what I think it is:

% DEBUG=1 testenv

I can also set environment variables for all processes in a session. Each shell has slightly different syntax for this:

% export DEBUG=2   # bash
$ setenv DEBUG=2   # csh
C:> set DEBUG=2    # Windows

If I don’t set some of the environment variables I use in the program Perl complains about an uninitialized value since I have warnings on. When I try to check the values in the if statement modifiers in the last program, I get those warnings because I’m using undefined values. To get around that, I set some defaults. The || short circuit operator is handy here:

my $Debug   = $ENV{DEBUG}   || 0;
my $Verbose = $ENV{VERBOSE} || 1;

Sometimes 0 is a valid value even though it’s false so I don’t want to continue with the short circuit if the value is defined. In these cases, the ternary operator along with defined comes in handy:

my $Debug   = defined $ENV{DEBUG} ? $ENV{DEBUG} : 0;
my $Verbose = defined $ENV{VERBOSE} ? $ENV{VERBOSE} : 1;

Perl 5.10 has the defined-or (//) operator. It evaluates that argument on its left and returns it if it is defined, even if it is false. Otherwise, it continues onto the next value:

my $Verbose = $ENV{VERBOSE} // 1;  # new in Perl 5.10?

The // started out as new syntax for Perl 6 but is so cool that it made it into Perl 5.10. As with other new features, I need to weigh its benefit with the loss of backward-compatibility.

Some values may even affect others. I might want a true value for $DEBUG to imply a true value for $VERBOSE, which would otherwise be false:

my $Debug   = $ENV{DEBUG}   || 0;
my $Verbose = $ENV{VERBOSE} || $ENV{DEBUG} || 0;

Before I consider heavy reliance on environment variables, I should consider my target audience and which platform it uses. If those platforms don’t support environment variables, I should come up with an alternative way to configure my program.

Command-Line Switches

Command-line switches are arguments to my program that usually affect the way the program behaves, although in the odd case they do nothing but add compatibility for foreign interfaces. In Advanced Perl Programming, Simon Cozens talked about the different things that Perl programmers consistently reinvent (which is different from reinventing consistently). Command-line switches is one of them. Indeed, when I look on CPAN to see just how many there are, I find Getopt::Std, Getopt::Long, and 87 other modules with Getopt in the name.

I can deal with command-line switches in several ways; it’s completely up to me how to handle them. They are just arguments to my Perl program, and the modules to handle them simply remove them from @ARGV and do the necessary processing to make them available to me without getting in the way of other, non-switch arguments. When I consider the many different ways people have used command-line switches in their own creations, it’s no wonder there are so many modules to handle them. Even non-Perl programs show little consistency in their use.

This list isn’t definitive, and I’ve tried to include at least two Perl modules that handle each situation. I’m not a fan of tricky argument processing, and I certainly haven’t used most of these modules beyond simple programs. Although CPAN had 89 modules matching “Getopt,” I only looked at the ones I was able to install without a problem, and even then, looked further at the ones whose documentation didn’t require too much work for me to figure out.

  1. Single-character switches each proceeded by their own hyphen; I need to treat these individually (Getopt::Easy, Getopt::Std, Perl’s -s switch):

    % foo -i -t -r
    
  2. Single-character switches proceeded by their own hyphen and with possible values (mandatory or optional), with possible separator characters between the switch and the value (Getopt::Easy, Getopt::Std, Getopt::Mixed, Perl’s -s switch):

    % foo -i -t -d/usr/local
    % foo -i -t -d=/usr/local
    % foo -i -t -d /usr/local
    
  3. Single-character switches grouped together, also known as bundled or clustered switches, but still meaning separate things (Getopt::Easy, Getopt::Mixed, Getopt::Std):

    % foo -itr
  4. Multiple-character switches with a single hyphen, possibly with values. (Perl’s -s switch):

    % foo -debug -verbose=1
    
  5. Multiple-character switches with a double hyphen, along with single-character switches and a single hyphen, possibly grouped (Getopt::Attribute, Getopt::Long, Getopts::Mixed):

    % foo --debug=1 -i -t
    % foo --debug=1 -it
    
  6. The double hyphen, meaning the end of switch parsing; sometimes valid arguments begin with a hyphen, so the shell provides a way to signal the end of the switches (Getopt::Long, Getopts::Mixed, and -s if I don’t care about invalid variable names such as ${-debug}):

    % foo -i -t --debug -- --this_is_an_argument
    
  7. Switches might have different forms or aliases that mean the same thing (Getopt::Lucid, Getopts::Mixed):

    % foo -d
    % foo --debug
    
  8. Completely odd things with various sigils or none at all (Getopt::Declare):

    % foo input=bar.txt --line 10-20
    

The -s Switch

I don’t need a module to process switches. Perl’s -s switch can do it as long as I don’t get too fancy. With this Perl switch, Perl turns the program switches into package variables. It can handle either single hyphen or double hyphens (which is just a single hyphen with a name starting with a hyphen). The switches can have values, or not. I can specify -s either on the command line or on the shebang line:

#!/usr/bin/perl -sw
# perl-s-abc.pl
use strict;

use vars qw( $a $abc );

print "The value of the -a switch is [$a]\n";
print "The value of the -abc switch is [$abc]\n";

Without values, Perl sets to 1 the variable for that switch. With a value that I attach to the switch name with an equal sign (and that’s the only way in this case), Perl sets the variable to that value:

% perl -s ./perl-s-abc.pl -abc=fred -a
The value of the -a switch is [1]
The value of the -abc switch is [fred]

I can use double hyphens for switches that -s will process:

% perl -s ./perl-s-debug.pl --debug=11

This causes Perl to create an illegal variable named ${'-debug'} even though that’s not strict safe. This uses a symbolic reference to get around Perl’s variable naming rules so I have to put the variable name as a string in curly braces. This also gets around the normal strict rules for declaring variables so I have to turn off the 'refs' check from strict to use the variables:

#!/usr/bin/perl -s
# perl-s-debug.pl
use strict;

{
no strict 'refs';
print "The value of the --debug switch is [${'-debug'}]\n";
print "The value of the --help switch is [${'-help'}]\n";
}

The previous command line produces this output:

The value of the --debug switch is [11]
The value of the --help switch is []

I don’t really need the double dashes. The -s switch doesn’t cluster switches so I don’t need the double dash to denote the long switch name. Creating variable names that start with an illegal character is a convenient way to segregate all of the configuration data; however, I still don’t endorse that practice.

Getopt Modules

I can’t go over all of the modules I might use or that I mentioned earlier, so I’ll stick to the two that come with Perl, Getopt::Std and Getopt::Long (both available since the beginning of Perl 5). You might want to consider if you really need more than these modules can handle. You’re pretty sure to have these available with the standard Perl distribution, and they don’t handle odd formats that could confuse your users.

Getopt::Std

The Getopt::Std handles single-character switches that I can cluster and give values to. The module exports two functions, one without an “s,” getopt, and one with an “s,” getopts, but they behave slightly differently (and I’ve never figured out a way to keep them straight).

The getopt function expects each switch to have a value (i.e., -n=1) and won’t set any values if the switch doesn’t have an argument (i.e., -n). Its first argument is a string that denotes which switches it expects. Its second argument is a reference to a hash in which it will set the keys and values. I call getopt at the top of my program:

#!/usr/bin/perl
# getopt-std.pl
use strict;

use Getopt::Std;

getopt('dog', \ my %opts );

print <<"HERE";
The value of
   d       $opts{d}
   o       $opts{o}
   g       $opts{g}
HERE

When I call this program with a switch and a value, I see that getopt sets the switch to that value:

$ perl getopt-std.pl -d 1
The value of
   d       1
   o
   g

When I call the same program with the same switch but without a value, getopt does not set a value:

$ perl getopt-std.pl -d
The value of
   d
   o
   g

There is a one argument form of getopt that I’m ignoring because it creates global variables, which I generally try to avoid.

The getopts (the one with the s) works a bit differently. It can deal with switches that don’t take arguments and sets the value for those switches to 1. To distinguish between switches with and without arguments, I put a colon after the switches that need arguments.

In this example, the d and o switches are binary, and the g switch takes an argument:

#!/usr/bin/perl
# getopts-std.pl

use Getopt::Std;

getopts('dog:', \ my %opts );

print <<"HERE";
The value of
   d       $opts{d}
   o       $opts{o}
   g       $opts{g}
HERE

When I give this program the g switch with the value foo and the -d switch, getopts sets the values for those switches:

$ perl getopts-std.pl -g foo -d
The value of
   d       1
   o
   g       foo

If a switch takes an argument, it grabs whatever comes after it no matter what it is. If I forget to provide the value for -g, for instance, it unintentionally grabs the next switch:

% ./getopts.pl -g -d -o
The value of
   d
   o
   g       -d

On the other hand, if I give a value to a switch that doesn’t take a value, nothing seems to work correctly. Giving -d a value stops getopts argument processing:

$ perl getopts-std.pl  -d foo -g bar -o
The value of
   d       1
   o
   g

Getopt::Long

The Getopt::Long module can handle the single-character switches, bundled single-character switches, and switches that start with a double hyphen. I give its GetOptions function a list of key-value pairs where the key gives the switch name and the value is a reference to a variable where GetOptions puts the value:

#!/usr/bin/perl
# getoptions-v.pl

use Getopt::Long;

my $result = GetOptions(
        'debug|d'   => \ my $debug,
        'verbose|v' => \ my $verbose,
        );

print <<"HERE";
The value of
   debug           $debug
   verbose         $verbose
HERE

In this example I’ve also created aliases for some switches by specifying their alternative names with the vertical bar, |. I have to quote those keys since | is a Perl operator (and I cover it in Chapter 16). I can turn on extra output for that program with either -verbose or -v because they both set the variable $verbose:

$ perl getoptions-v.pl -verbose
The value of
   debug
   verbose         1

$ perl getoptions-v.pl -v
The value of
   debug
   verbose 1

$ perl getoptions-v.pl -v -d
The value of
   debug           1
   verbose         1

$ perl getoptions-v.pl -v -debug
The value of
   debug           1
   verbose         1

$ perl getoptions-v.pl -v --debug
The value of
   debug           1
   verbose         1

By just specifying the key names, the switches are boolean so I get just true or false. I can tell GetOptions a bit more about the switches to let Perl know what sort of value to expect. In GetOptions, I set options on the switches with an equal sign after the switch name. An =i indicates an integer value, an =s means a string, and nothing means it’s simply a flag, which is what I had before. There are other types, too. If I give the switch the wrong sort of value, for instance, a string where I wanted a number, GetOptions doesn’t set a value (so it doesn’t turn a string into the number 0, for instance):

#!/usr/bin/perl
# getopt-long-args.pl

use Getopt::Long;

my $result = GetOptions(
        "file=s" => \ my $file,
        "line=i" => \ my $line,
        );

print <<"HERE";
The value of
        file            $file
        line            $line
HERE

If I give the switch the wrong sort of value, for instance, a string where I wanted a number, GetOptions doesn’t set a value. My -line switch expects an integer and works fine when I give it one. I get a warning when I try to give it a real number:

$ perl getopt-long-args.pl -line=-9
The value of
        file
        line            -9
$ perl getopt-long-args.pl -line=9.9
Value "9.9" invalid for option line (number expected)
The value of
        file
        line

I can use an @ to tell GetOptions that the switch’s type will allow it to take multiple values. To get multiple values for -file, I put the @ after the =s. I also assign the values to the array @files instead of a scalar:

#!/usr/bin/perl
# getopt-long-mult.pl

use Getopt::Long;

my $result = GetOptions(
        "file=s@" => \ my @files,
        );


{
local $" = ", ";

print <<"HERE";
The value of
        file            @files
HERE
}

To use this feature, I have to specify the switch multiple times on the command line:

$ perl getopt-long-mult.pl --file foo --file bar
The value of
        file            foo, bar

Configuration Files

If I’m going to use the same values most of the time or I want to specify several values, I can put them into a file that my program can read. And, just as I can use one of many command-line option parsers, I have several configuration file parsers from which to choose.

I recommend choosing the right configuration format for your situation, then choose an appropriate module to deal with the right format.

ConfigReader::Simple

I’m a bit partial to ConfigReader::Simple because I maintain it (although I did not originally write it). It can handle multiple files (for instance, including a user configuration file that can override a global one) and has a simple line-oriented syntax:

# configreader-simple.txt
file=foo.dat
line=453
field value
field2 = value2
long_continued_field This is a long \
        line spanning two lines

The module handles all of those formats:

#!/usr/bin/perl
# configreader-simple.pl

use ConfigReader::Simple;

my $config = ConfigReader::Simple->new(
        "configreader-simple.txt" );
die "Could not read config! $ConfigReader::Simple::ERROR\n"
        unless ref $config;

print "The line number is ", $config->get( "line" ), "\n";

Config::IniFiles

Windows folks are used to INI files and there are modules to handle those, too. The basic format breaks the configuration into groups with a heading inside square brackets. Parameters under the headings apply to that heading only, and the key and value have an equals sign between them (or in some formats, a colon). Comment lines start with a semicolon. The INI format even has a line continuation feature. The Config::IniFiles module, as well as some others, can handle these. Here’s a little INI file I might use to work on this book:

[Debugging]
;ComplainNeedlessly=1
ShowPodErrors=1

[Network]
email=brian.d.foy@gmail.com

[Book]
title=Mastering Perl
publisher=O'Reilly Media
author=brian d foy

I can parse this file and get the values from the different sections:

#!/usr/bin/perl
# config-ini.pl

use Config::IniFiles;

my $file = "mastering_perl.ini";

my $ini = Config::IniFiles->new(
        -file    => $file
        ) or die "Could not open $file!";

my $email = $ini->val( 'Network', 'email' );
my $author = $ini->val( 'Book', 'author' );

print "Kindly send complaints to $author ($email)\n";

Besides just reading the file, I can use Config::IniFiles to change values, add or delete values, and rewrite the INI file.

Config::Scoped

Config::Scoped is similar to INI in that it can limit parameters to a certain section but it’s more sophisticated. It allows nested section, Perl code evaluation (remember what I said about that earlier, though), and multivalued keys:

book {
        author = {
                name="brian d foy";
                email="brian.d.foy@gmail.com";
                };
        title="Mastering Perl";
        publisher="O'Reilly Media";
}

The module parses the configuration and gives it back to me as a Perl data structure:

#!/usr/bin/perl
# config-scoped.pl

use Config::Scoped;

my $config = Config::Scoped->new( file => 'config-scoped.txt' )->parse;
die "Could not read config!\n" unless ref $config;

print "The author is ", $config->{book}{author}{name}, "\n";

AppConfig

Andy Wardley’s AppConfig is perhaps the most high-powered of all configuration handlers and provides a unified interface to command-line options, configuration files, environment variables, CGI parameters, and many other things. It can handle the line-oriented format of ConfigReader::Simple, the INI format of Config::INI, and many other formats. Andy uses AppConfig for his Template Toolkit, the popular templating system.

Here’s the AppConfig version of my earlier INI reader, using the same INI file that I used earlier:

#!/usr/bin/perl
# appconfig-ini.pl

use AppConfig;

my $config = AppConfig->new;

$config->define( 'network_email=s'  );
$config->define( 'book_author=s'    );
$config->define( 'book_title=s'     );
$config->define( 'book_publisher=s' );

$config->file( 'config.ini' );

my $email  = $config->get( 'network_email' );
my $author = $config->get( 'book_author' );

print "Kindly send complaints to $author ($email)\n";

This program is a bit more complicated. Since AppConfig does so many different things, I have to give it some hints about what it is going to do. Once I create my $config object, I have to tell it what fields to expect and what sorts of values they’ll have. AppConfig uses the format syntax from Getopt::Long. With the INI format, AppConfig flattens the structure by taking the section names and using them as prefixes for the values. My program complains about the fields I didn’t define, and AppConfig gets a bit confused on the INI commented line ;complainneedlessly:

debugging_;complainneedlessly: no such variable at config.ini line 2
debugging_showpoderrors: no such variable at config.ini line 3
Kindly send complaints to brian d foy (brian.d.foy@gmail.com)

Now that I have that my AppConfig program, I can change the configuration format without changing the program. The module will figure out my new format automatically. My previous program still works as long as I update the filename I use for the configuration file. Here’s my new configuration format:

network_email=brian.d.foy@gmail.com
book_author=brian d foy

With a small change I can let my program handle the command-line arguments, too. When I call $config->args() without an argument, AppConfig processes @ARGV using Getopt::Long:

#!/usr/bin/perl
# appconfig-args.pl

use AppConfig;

my $config = AppConfig->new;

$config->define( 'network_email=s'  );
$config->define( 'book_author=s'    );
$config->define( 'book_title=s'     );
$config->define( 'book_publisher=s' );

$config->file( 'config.ini' );

$config->args();

my $email  = $config->get( 'network_email' );
my $author = $config->get( 'book_author' );

print "Kindly send complaints to $author ($email)\n";

Now when I run my program and supply another value for network_email on the command line, its value overrides the one from the file because I use $config->args after $config->file:

$ perl appconfig-args.pl
Kindly send complaints to brian d foy (brian.d.foy@gmail.com)

$ perl appconfig-args.pl -network_email bdfoy@cpan.org
Kindly send complaints to brian d foy (bdfoy@cpan.org)

AppConfig is much more sophisticated than I’ve shown and can do quite a bit more. I’ve listed some articles on AppConfig in Further Reading,” at the end of the chapter.

Other Configuration Formats

There are many other configuration formats and each of them probably already has a Perl module to go with it. Win32::Registry gives me access to the Windows Registry, Mac::PropertyList deals with Mac OS X’s plist format, and Config::ApacheFile parses the Apache configuration format. Go through the list of Config:: modules on CPAN to find the one that you need.

Scripts with a Different Name

My program can also figure out what to do based on the name I use for it. The name of the program shows up in the Perl special variable $0, which you might also recognize from shell programing. Normally, I only have one name for the program. However, I can create links (symbolic or hard) to the file. When I call the program using one of those names, I can set different configuration values:

if( $0 eq ... )    { ... do this init ... }
elsif( $0 eq ... ) { ... do this init ... }
...
else               { ... default init ... }

Instead of renaming the program, I can embed the program in a another program that sets the environment variables and calls the program with the right command-line switches and values. In this way, I save myself a lot of typing to set values:

#!/bin/sh

DEBUG=0
VERBOSE=0
DBI_PROFILE=2

./program -n some_value -m some_other_value

Interactive and Noninteractive Programs

Sometimes I want the program to figure out on its own if it should give me output or ask me for input. When I run the program from the command line, I want to see some output so I know what it’s doing. If I run it from cron (or some other job scheduler), I don’t want to see the output.

The real question isn’t necessarily whether the program is interactive but most likely if I can send output to the terminal or get input from it.

I can check STDOUT to see if the output will go to a terminal. Using the -t file test tells me if the filehandle is connected to a terminal. Normally, command-line invocations are so connected:

$ perl -le 'print "Interactive!" if -t STDOUT'
Interactive!

If I redirect STDOUT, perhaps by redirecting output.txt, it’s not connected to the terminal anymore and my test program prints no message:

$ perl -le 'print "Interactive!" if -t STDOUT' > output.txt

I might not intend that, though. Since I’m running the program from the command line I still might want the same output I would normally expect.

If I want to know if I should prompt the user, I can check to see if STDIN is connected to the terminal although I should also check whether my prompt will show up somewhere a user will see that:

$ perl -le 'print "Interactive!" if( -t STDIN and -t STDOUT )'
Interactive!

I have to watch what I mean and ensure I test the right thing. Damian Conway’s IO::Interactive might help since it handles various special situations to determine if a program is interactive:

use IO::Interactive qw(is_interactive);

my $can_talk = is_interactive();
print "Hello World!\n" if $can_talk;

Damian includes an especially useful feature, his interactive function, so I don’t have to use conditionals with all of my print statements. His interactive function returns the STDOUT filehandle if my program is interactive and a special null filehandle otherwise. That way I write a normal print statement:

use IO::Interactive qw(interactive);

print { interactive() } "Hello World!\n";

I have to use the curly braces around my call to interactive() because it’s not a simple reference. I still don’t include a comma after the braces. I get output when the program is interactive and no output when it isn’t.

There are several other ways that I could use this. I could capture the return value of interactive by assigning it to a scalar and then using that scalar for the filehandle in my print statement:

use IO::Interactive qw(interactive);

my $STDOUT = interactive();

print $STDOUT "Hello World!\n";

perl’s Config

The Config module exposes a hash containing the compilation options for my perl binary. Most of these values reflect either the capabilities that the Configure program discovered or the answers I gave to the questions it asked.

For instance, if I want to complain about the perl binary, I could check the value for cf_email. That’s supposed to be the person (or role) you contact for problems with the perl binary, but good luck getting an answer!

#!/usr/bin/perl

use Config;

print "Send complaints to $Config{cf_email}\n";

If I want to guess the hostname of the perl binary (that is, if Config correctly identified it and I compiled perl on the same machine), I can look at the myhostname and mydomain (although I can also get those in other ways):

#!/usr/bin/perl

use Config;

print "I was compiled on $Config{myhostname}.$Config{mydomain}\n";

To see if I’m a threaded perl, I just check the compilation option for that:

#!/usr/bin/perl

use Config;

print "has thread support\n" if $Config{usethreads};

Different Operating Systems

I may need my program to do different things based on which platform I invoke it. On a Unix platform, I may load one module, whereas on Windows I load another. Perl knows where it’s running and puts a distinctive string in $^O (mnemonic: O for Operating system), and I can use that string to decide what I need to do. Perl determines that value when it’s built and installed. The value of $^O is the same as $Config{'osname'}. If I need something more specific, I can use the $Config{archname}.

I have to be careful, though, to specify exactly which operating system I want. Table 11-1 shows the value of $^O for popular systems, and the perlport documentation lists several more. Notice that I can’t just look for the pattern m/win/i to check for Windows since Mac OS X identifies itself as darwin.

Table 11-1. Values for $^O for selected platforms

Platform

$^O

Mac OS X

darwin

Mac Classic

Mac

Windows

Win32

OS2

OS2

VMS

VMS

Cygwin

Cygwin

I can conditionally load modules based on the operating system. For instance, the File::Spec module comes with Perl and is really a facade for several operating system specific modules behind the scenes. Here’s the entire code for the module. It defines the %module hash to map the values of $^O to the module it should load. It then requires the right module. Since each submodule has the same interface, the programmer is none the wiser:

package File::Spec;

use strict;
use vars qw(@ISA $VERSION);

$VERSION = '0.87';

my %module = (MacOS   => 'Mac',
                          MSWin32 => 'Win32',
                          os2     => 'OS2',
                          VMS     => 'VMS',
                          epoc    => 'Epoc',
                          NetWare => 'Win32', # Yes, File::Spec::Win32 works on↲
                                      NetWare.
                          dos     => 'OS2',   # Yes, File::Spec::OS2 works on↲
                                      DJGPP.
                          cygwin  => 'Cygwin');


my $module = $module{$^O} || 'Unix';

require "File/Spec/$module.pm";
@ISA = ("File::Spec::$module");

1;

Summary

I don’t have to hardcode user-defined data inside my program. I have a variety of ways to allow a user to specify configuration and runtime options without her ever looking at the source. Perl comes with modules to handle command-line switches, and there are even more on CPAN. Almost any configuration file format has a corresponding module on CPAN, and some formats have several module options. Although no particular technique is right for every situation, my users won’t have to fiddle with and potentially break the source code.

Further Reading

The perlport documentation discusses differences in platforms and how to distinguish them inside a program.

Teodor Zlatanov wrote a series of articles on AppConfig for IBM developerWorks, “Application Configuration with Perl” (http://www-128.ibm.com/developerworks/linux/library/l-perl3/index.html), “Application Configuration with Perl, Part 2” (http://www-128.ibm.com/developerworks/linux/library/l-appcon2.html), and “Complex Layered Configurations with AppConfig” (http://www-128.ibm.com/developerworks/opensource/library/l-cpappconf.html).

Randal Schwartz talks about Config::Scoped in his Unix Review column for July 2005: http://www.stonehenge.com/merlyn/UnixReview/col59.html.



[45] Google has a service to search open source code. Try http://codesearch.google.com to find references to config.pl.

Get Mastering Perl now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.