12. Debugging and Testing CGI Applications

The hardest aspect of developing CGI applications on the Web is the testing/debugging phase. The main reason for the difficulty is that applications are being run across a network, with client and server interaction. When there are errors in CGI programs, it is difficult to figure out where they lie.

In this chapter, we will discuss some of the common errors in CGI script design, and what you can do to correct them. In addition, we will look at a debugging/lint tool for CGI applications, called CGI Lint, written exclusively for this book.

12.1 Common Errors

Initially, we will discuss some of the simpler errors found in CGI application design. Most CGI designers encounter these errors at one time or another. However, they are extremely easy to fix.

CGI Script in Unrecognized Directory

Most servers require that CGI scripts reside in a special directory (/cgi-bin), or have certain file extensions. If you try to execute a script that does not follow the rules for a particular server, the server will simply retrieve and display the document, instead of executing it. For example, if you have the following two lines in your NCSA server resource map configuration file (srm.conf):

ScriptAlias  /my-cgi-apps/ /usr/local/bin/httpd_1.4.2/cgi-bin/
AddType      application/x-httpd-cgi   .cgi .pl

the server will execute only scripts with URLs that either contain the string “/my-cgi-apps,” or have a file extension of .pl or .cgi. Take a look at the following URLs and figure out which ones the server will try to execute:

http://some.machine.com/cgi-bin/clock.tcl
http://my.machine.edu/my-cgi-apps/clock.pl
http://your.machine.org/index.cgi
http://their.machine.net/cgi-bin/animation.pl

If you picked the last three, then you are correct! Let's look at why this so. The first one will not get executed because the script is neither in a recognized directory (my-cgi-apps), nor does it have a valid extension (.cgi or .pl). The second one refers to the correct CGI directory, while the last two have valid extensions.

Missing Interpreter Line

If your CGI application is a script of some sort (a C Shell, Perl, etc.), it must contain a line that begins with #! (a “sharp-bang,” or “shebang”), or else the server will not know what interpreter to call to execute the script. You don't have to worry about this if your CGI program is written in C/C++, or any other language that creates a binary. This leads us to another closely related problem, as we will soon see.

File Permission Problems

The CGI script must be executable by the server. Most servers are set up to run with the user identification (UID) of “nobody,” which means that your scripts have to be world executable. The reason for this is that “nobody” has minimal privileges. You can check the permissions of your script on UNIX systems by using the ls command:

% ls -ls /usr/local/bin/httpd_1.4.2/cgi-bin/clock.pl
   4 -rwx------  1 shishir      3624 Aug 17 17:59 clock.pl*

The second field lists the permissions for the file. This field is divided into three parts: the privileges for the owner, the group, and the world (from left to right), with the first letter indicating the type of the file: either a regular file, or a directory. In this example, the owner has sole permission to read, write, and execute the script.

If you want the server (running as “nobody”) to be able to execute this script, you have to issue the following command:

% chmod 755 clock.pl
   4 -rwx--x--x  1 shishir      3624 Aug 17 17:59 clock.pl*

The chmod command modifies the permissions for the file. The octal code of 711 indicates read (octal 4), write (octal 2), and execute (octal 1) permissions for the owner, and execute permissions for group members and all other members.

Malformed Header from Script

All CGI applications must output a valid HTTP header, followed by a blank line, before any other data. In other words, two newline characters have to be output after the header. Here is how the output should look:

Content-type: text/html
<HTML>
<HEAD><TITLE>Output from CGI Script</TITLE></HEAD>
.
.
.

The headers must be output before any other data, or the server will generate a server error with a status of 500. So make it a habit to output this data as early in the script as possible. To make it easier for yourself, you can use a subroutine like the following to output the correct information:

sub output_MIME_header
{
    local ($type) = @_;
    print "Content-type: ", $type, "\n\n";
}

Just remember to call it at the beginning of your program (before you output anything else). Another problem related to this topic has to do with how the script executes. If the CGI program has errors, then the interpreter, or compiler, will produce an error message when trying to execute the program. These error messages will inevitably be output before the HTTP header, and the server will complain.

What is the moral of this? Make sure you check your script from the command line before you try to execute it on the Web. If you are using Perl, you can use the -wc switch to check for syntax errors:

% perl -wc clock.pl
syntax error in file clock.pl at line 9, at EOF
clock.pl had compilation errors.

If there are no errors (but there are warnings), the Perl interpreter will display the following:

% perl -wc clock.pl
Possible typo: "opt_g" at clock.pl line 9.
Possible typo: "opt_u" at clock.pl line 9.
Possible typo: "opt_f" at clock.pl line 9.
clock.pl syntax OK

Warnings indicate such things as possible typing errors or use of uninitialized variables. Most of the time, these warnings are benign, but you should still take the time to look into them. Finally, if there are no warnings or errors to be displayed, Perl will output the following:

% perl -wc clock.pl
clock.pl syntax OK

So it is extremely important to check to make sure the script runs without any errors on the command line before trying it out on the Web.

12.2 Programming/System Errors

Now that we have looked at some of the common errors in CGI application design, let's focus on programming errors that can cause unexpected results. There is one extremely important point that you should be aware of:

Always check the return value of all the system commands, including eval, open, and system.

What does this mean? The next few sections will describe some of the programming errors that occur frequently if you are not careful.

Opening, Reading, and Writing Files

Since the server is running as a user that has minimal privileges (usually “nobody”), you must be careful when reading from or writing to files. Here is an example:

open (FILE, "<" . "/usr/local/httpd_1.4.2/data");
while (<FILE>) {
    print;
}
close (FILE);

Now, what if the file that you are trying to read is not accessible? The file handle FILE will not be created, but the while loop tries to iterate through that file handle. Fortunately, Perl does not get upset, but you will not have any data. So, it is always better to check the status of the open command, like this:

open (FILE, "<" . "/usr/local/httpd_1.4.2/data") ||
    &call_some_subroutine ("Oops! The read failed. We need to do something.");

This will ensure that the subroutine call_some_subroutine gets called if the script cannot open the file. Now, say you want to write to an output file:

open (FILE, ">" . "/usr/local/httpd_1.4.2/data");
print FILE "Line 1", "\n;
print FILE "Line 2", "\n";
close (FILE);

Again, you should check for the status of the open command:

open (FILE, ">" . "/usr/local/httpd_1.4.2/data") ||
        &call_some_subroutine ("Oops! The write failed.
                We need to do something".);

This is true when doing such tasks as updating a database or creating a counter data file. In order for the server to write to a file, it has to have write privileges on the file as well as the directories in which the file is located.

Pipes and the open Command

We used pipes to perform data redirection in numerous examples in this book. Unlike files, there is no easy way to check to see if the contents of the pipe have been successfully executed. Let's take a look at a simple example:

open (FILE, "/usr/bin/cat /home/shishir/.login |")
                || &call_some_subroutine ("Error opening pipe!");
while (<FILE>) {
    print;
}
close (FILE);

If the cat command cannot be found by the shell, you might expect that an error status will be returned by the open command, and thus the call_some_subroutine function will be called. However, this is not the case. An error status will be returned only if a pipe cannot be created (which is almost never the case). Due to the way the shell operates, the status of the command is available only after the file handle is closed. Here is an example:

open (FILE, "/usr/bin/cat /home/shishir/.login |")
    || &call_some_subroutine ("Error opening pipe!");
while (<FILE>) {
    print;
}
close (FILE);
if ($?) {
    &call_some_subroutine ("Error in executing command!");
}

Once the file handle is closed, Perl saves the return status in the variable $?. This is the method that you should use for all system commands.

There is another method for determining the status of the pipe before the file handle is closed, though it is not always 100% reliable. It involves checking the process ID (PID) of the process that is spawned by the open command:

$pid = open (FILE, "/usr/bin/cat /home/shishir/.login |");
sleep (2);
$status = kill 0, $pid;
if ($status) {
while (<FILE>) {
        print;
    }
    close (FILE);
} else {
    &call_some_subroutine ("Error opening pipe!");
}

This is a neat trick! The kill statement with an argument of 0 checks the status of the process. If the process is alive, a value of 1 is returned. Otherwise, a 0 is returned, which indicates that the process is no longer alive. The sleep command ensures a delay so that the value returned by kill reflects the status of the process.

12.3 Environment Variables

If you look back to the counter CGI applications in previous chapters, you will see that we saved the counter data in a text file. Some CGI programmers want to avoid using a file, and try to store the information in an environment variable. So they write code that resembles the following:

if ($ENV{'COUNTER'}) {
    $ENV{'COUNTER'}++;
} else {
    $ENV{'COUNTER'} = 1;
}

To their surprise, however, the counter value is always the same (1, in this case). The point behind this is that you cannot save any environment variables directly from Perl, although it is possible to do so by invoking the shell.

Basically, when a Perl program is started, a child process is created. And the cardinal rule in UNIX is that child processes cannot permanently affect their parent shell.

12.4 Logging and Simulation

At this point, you might be wondering where all the CGI errors get logged. If you are using the NCSA server, the log files directory is the place that holds them. You can manually place debugging messages into the error_log file by doing the following:

print STDERR "Calendar v1.0 - Just about to calculate center", "\n";
$center = ($diameter / 2) + $x_offset;
print STDERR "Calendar v1.0 - Finished calculating. Center = ", $center, "\n";

After the program is finished, you can look at the log file to see the various debugging messages. It is a good practice to insert the name of your program into the message, so you can find it among all of the different messages logged to the file. Another trick you can use is to “dupe” (or duplicate) standard error to standard output:

print "Content-type: text/plain", "\n\n";
open (STDERR, ">&" . STDOUT);
print STDERR "About to execute for loop", "\n";
for ($loop=0; $loop <= 10; $loop++) {
    $point[$loop] = ($loop * $center) + $random_number;
    print STDERR "Point number ", $loop, " is ", $point[$loop], "\n";
}
close (STDERR);

In this case, the errors generated by the CGI program will go to the browser as well as to the log file.

Client Simulation

In order to get a good feel for how the Web works, you should connect to a server and simulate a client's actions. You can do this by using the telnet protocol. Here is an example:

% telnet www.ora.com 80
Trying 198.112.208.13 ...
Connected to amber.ora.com.
Escape character is '^]'.
GET / HTTP/1.0
<HTML><HEAD>
  <TITLE>oreilly.com Home Page</TITLE>
</HEAD><BODY>
<P><A HREF="http://bin.gnn.com/cgi-bin/imagemap/radio">
<IMG SRC="/gnn/bus/ora/radio.gif"  ALT="" ISMAP></A>
.
.
.
</BODY></HTML>
Connection closed by foreign host.

You can enter other HTTP commands as well. But remember that HTTP is a stateless protocol. In other words, you can issue only one request, after which the server terminates the connection. Now let's look at the issues behind server simulation.

Server Simulation

If you do not have access to a server on a full-time basis, you can simulate the features of a server quite easily. Before we look at how this can be accomplished, let's look briefly at what the server actually does:

  • Gets a request from the client to serve a resource (either a file or a CGI program).
  • Checks to see if the file is a CGI script.
  • If it is, passes various environment variables/input stream to the CGI program, and waits for output.
  • Sends the output from either a regular file or CGI to the client.

In order to test CGI scripts, all we would have to do is emulate the third step in this process. Let's look at a typical GET request. First, we have to create a file to set the environment variables (e.g., environment.vars). Here is how you can do it in the C shell:

setenv REQUEST_METHOD     'GET'
setenv QUERY_STRING       'name=John%20Surge&company=ABC%20Corporation%21'
setenv HTTP_ACCEPT        'image/gif, image/x-xbitmap, image/jpeg, */*'
setenv SERVER_PROTOCOL    'HTTP/1.0'
setenv REMOTE_ADDR        '198.198.198.198'
setenv DOCUMENT_ROOT      '/usr/local/bin/httpd_1.4.2/public'
setenv GATEWAY_INTERFACE  'CGI/1.1'
setenv REQUEST_METHOD     'GET'
setenv SCRIPT_NAME        '/cgi-bin/abc.pl'
setenv SERVER_SOFTWARE    'NCSA/1.4.2'
setenv REMOTE_HOST        'gateway.cgi.com'

In a Bourne-compatible shell (such as Korn shell, bash, or zsh), the previous commands will not work. Instead, you need the following syntax:

export REQUEST_METHOD = 'GET'
export QUERY_STRING =  'name=John%20Surge&company=ABC%20Corporation%21'
.
.
.

Then, we have to execute this script with the following command (assuming the commands are stored in the file environment.vars) in the C shell:

% source environment.vars

In a Bourne-compatible shell, you need to do the following:

% . environment.vars

Now, you can simply run your CGI script, and it should work as though it was being executed by the server. For POST requests, the process is slightly different. You first have to create a file that contains the POST information (e.g., post_data.txt):

name=John%20Surge&company=ABC%20Corporation%21&sports=Basketball&
exercise=3&runners=no

Once that is done, you need to determine the content length (or the size in bytes) of the data. You can do that with the wc command:

% wc -c post_data.txt
   86

Then you need to add the following two lines to the environment variable file that we created above (assuming C shell):

setenv REQUEST_METHOD     'POST'
setenv CONTENT_LENGTH     '86'

Now all you have to do is send the data in the file to the CGI program through a pipe:

% /usr/local/bin/httpd_1.4.2/cgi-bin/abc.pl < post_data.txt

That's all there is to it. The CGI Lint application automates this procedure, as we will see next.

12.5 CGI Lint--A Debugging/Testing Tool

CGI Lint greatly simplifies the process of testing and debugging CGI applications. Appendix E, Applications, Modules, Utilities, and Documentation, lists where you can get CGI Lint.

Depending on the type of request (either GET or POST), either one or two auxiliary files are required by CGI Lint. The first is a configuration file, which should contain a list of the environment variables in the following format:

REQUEST_METHOD     =   GET
QUERY_STRING       =   name=John Surge&company=ABC Corporation!
HTTP_ACCEPT        =   image/gif, image/x-xbitmap, image/jpeg, */*
SERVER_PROTOCOL    =   HTTP/1.0
REMOTE_ADDR        =   198.198.198.198
SERVER_ROOT        =   /usr/local/bin/httpd_1.4.2
DOCUMENT_ROOT      =   /usr/local/bin/httpd_1.4.2/public
GATEWAY_INTERFACE  =   CGI/1.1
SCRIPT_NAME        =   /cgi-bin/abc.pl
SERVER_SOFTWARE    =   NCSA/1.4.2
REMOTE_HOST        =   gateway.cgi.com

This format has an advantage over the previous one: You do not need to encode the query string. However, if you have either %, &, or = characters in the query string, you need to escape them by placing a “\” before them:

QUERY_STRING       =   name=Joe\=Joseph&company=JP \& Play&percentage=50\%

Or you can just use the encoded values of %25, %26, and %3d to represent the “%,” “&,” and “=” characters, respectively. Now, you are ready to test out your CGI program:

% CGI_Lint get.cfg

CGI Lint executes the script that is pointed to by the environment variables SCRIPT_NAME and SERVER_ROOT. In addition, you can use a data file to store query information. Here is an example:

% CGI_Lint form.cfg form.data

The format for the data file should be:

name = Joe\=Joseph
company = JP \& Play
percentage = 50\%

If you already have data stored in QUERY_STRING, CGI Lint will process the data from both sources. In the case of POST requests, all you have to do is change the REQUEST_METHOD to “POST” and run it in the same exact way as before:

% CGI_Lint form.cfg form.data

In addition, you can test the multipart/form-data encoding scheme (see Appendix D, CGI Lite), which is a new addition to the Web. For multipart MIME data, you need to add the following line to the configuration file:

CONTENT_TYPE = multipart/form-data

Normally, multipart data contains boundary strings between fields, but you do not have to go to the trouble of inserting the numerous multipart headers. CGI Lint takes care of all that for you. Now, here is the format for the data file:

name = Joe = Joseph
company = JP & Play
percentage = 50%
review = */usr/shishir/rev.dat

You would execute the script in the same way as you did all the others. CGI Lint reads through the fields and creates a multipart MIME body:

-----------------------------78198732381
Content-disposition: form-data; name="name"
Joe = Joseph
-----------------------------78198732381
Content-disposition: form-data; name="company"
JP & Play
-----------------------------78198732381
Content-disposition: form-data; name="percentage"
50%
-----------------------------78198732381
Content-disposition: form-data; name="review"; filename="/usr/ shishir/rev.dat"
.
.
(contents of the file /home/shishir/rev.dat)
.
.
-----------------------------78198732381--

One thing to note here is the last line of the data file. The asterisk instructs the tool to include the information stored in the file /usr/shishir/review.dat. That is one of the powerful features of multipart messages: it allows users to upload files to the server.

In addition to simulating the server data streams, CGI Lint also checks a number of attributes and properties before running the script.

CGI Lint in Action

Let's take a simple CGI program and run it through CGI Lint, and see what happens. Here is the program-it should be familiar to you, as it was introduced at the end of Chapter 7, Advanced Form Applications:

#!/usr/local/bin/perl
&parse_form_data(*simple);
$user = $simple{'user'};
print "Content-type: text/plain", "\n\n";
print "Here are the results of your query: ", "\n";
print `/usr/ucb/finger $user`;
print "\n";
exit (0);

This program outputs finger information about the specified user. Here is the form that is associated with the program:

<FORM ACTION="/cgi-bin/finger.pl" METHOD="POST">
<INPUT TYPE="text" NAME="user" SIZE=40>
<INPUT TYPE="submit" VALUE="Get Information">
</FORM>

Now, let's create the configuration and data files, to be used with CGI Lint. The configuration file must contain the following lines:

REQUEST_METHOD = POST
SERVER_ROOT = /usr/local/bin/httpd_1.4.2
    SCRIPT_NAME = /cgi-bin/finger.pl

Since the form passes the information to the program using POST, we need to create a data file to hold the post data. It needs to consist of only one line:

user = shishir

This is equivalent to the user entering “shishir” in the user field in the form. That is all that needs to be done. Here is how you would execute CGI Lint (assuming that the configuration file is called finger.cfg, and the data file is called finger.dat):

% CGI_Lint finger.cfg finger.dat

CGI Lint will output the following information:

While looking at your Perl script for possible security holes and
"open" commands, I came across the following statements that *might*
constitute a security breach:
================================================================================
Check the *backtics* on line: print `/usr/ucb/finger $user`;
Variable(s) *may* not be secure!
================================================================================

It looks as though your script has no bugs (at least, on the surface),
so here is the output you have been waiting for:
================================================================================
Here are the results of your query: <BR><HR>
Login name: shishir                     In real life: Shishir Gundavaram
Directory: /home/shishir                Shell: /usr/local/bin/tcsh
On since Oct 26 23:11:27 on ttyp0 from nmrc.bu.edu
Mail last read Mon Oct 27 00:03:54 1995
No Plan.
<HR>
================================================================================

It will display the output generated by the CGI program. It also outputs various other information, including possible security holes. Here is a list of the exact informational messages that CGI Lint outputs:

  • The configuration file (that holds the environment variable data) could not be found. This file is needed to run this program. Please check and try again.
  • The NCSA server resource map configuration file (srm.conf) could not be found. This might be due to the way your server is set up. In order to rectify the situation, define a variable called SERVER_ROOT (with the correct server root directory) in the configuration file, and try again.
  • Sorry, either the file extension or the path to your CGI script is not valid. Check both of these to make sure they are configured in the NCSA server resource map configuration (srm.conf) file.
  • You do not have the necessary privileges to run the specified script. Use the chmod command to change the permissions, and try again.
  • The CGI program that is specified in the configuration file does not exist. Please check the path, and try again.
  • The CGI program that is specified could not be opened. Please check the permissions and try again.
  • The interpreter you specified either does not exist, is not readable, or is not a binary file. Please check the path, and try again.
  • The script you specified does not have a header line that points to a interpreter that will execute the script. The header line should be something like this:

    #!/usr/local/bin/perl

  • Oops! The script you wrote had errors. I will list all the bugs here. Please fix them and try again. Here they are:
  • While looking at your Perl script for possible security holes and “open” commands, I came across the following *errors*:
  • While looking at your Perl script for possible security holes and “open” commands, I came across the following statements that *might* constitute a security breach:
  • The data file (that holds the potential form data) could not be found. Please check the file specification and try again.
  • A data file to store the simulated POST data cannot be created. Please check to see if you have privileges to write to the /tmp directory.
  • One of the filenames that you listed in the simulated multipart data file does not exist. Be sure to check all possible fields, and try again.
  • The CONTENT_TYPE variable in your data file is not set correctly. You do not have to set a value for this, as I will default it to:

    application/x-www-form-urlencoded

    But, if you do set a value for this variable, it has to be either the one mentioned above, or:

    multipart/form-data

    If you specify an encoding type of multipart/form-data in the configuration file, I will create a random boundary, and set the CONTENT_TYPE to the following:

    multipart/form-data; boundary=--------------Some Random Boundary

  • The REQUEST_METHOD variable in your data file is not set correctly. It has to have a value of either GET or POST.
  • Your NPH (Non-Parsed-Header) script does not output the correct HTTP response. The first line has to be something like:

    HTTP/1.0 200 OK

  • A serious error! Either you are not outputting a **BLANK** line after the HTTP headers, *OR* you are trying to send invalid (or undefined) HTTP headers. Please check the output of your script and try again.
  • It looks as though your script has no bugs (at least, on the surface), so here is the output you have been waiting for:
  • The *system* command was detected in your script. Make sure to turn output buffering off by adding the following line to your script:

    $| = 1;

12.6 Set UID/GID Wrapper

Now that we have a debugging/lint tool for CGI programs, how do we set this up so that it executes as the same UID as that of the Web server? If the Web server runs with your own UID, then you do not have to do anything. But, if it runs as some other UID, say “nobody” or “www,” then you have to ask the system administrator to run a script called wrapper, which sets the UID/GID bits. Let's quickly look at this script.

The wrapper is based on a program in the book Programming Perl by Larry Wall and Randal Schwartz (two of the most knowledgable Perl gurus around). Here is the format for the wrapper command:

% wrapper -f /usr/local/bin/CGI_Lint -u nobody -g none

The -f switch specifies the filename to use, while the -u and the -g switches set the UID and GID, respectively. You could also use numerical identification numbers:

% wrapper -f /usr/local/bin/CGI_Lint -u 628120 -g 120

This will create a C executable with the specified UID and GID bits set, that will, in turn, run the CGI script.

Get CGI Programming on the World Wide Web now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.