Search the Catalog
CGI Programming on the World Wide Web

CGI Programming on the World Wide Web

By Shishir Gundavaram
1st Edition March 1996





This book is out of print, but it has been made available online through the O'Reilly Open Books Project.


Previous Chapter 8
Multiple Form Interaction
Next
 

8.2 CGI Side Includes

Using hidden fields is probably the simplest way to maintain information across multiple CGI instances. But it is far from the most efficient.

In this next example of maintaining state, we embed special codes into HTML documents that resemble Server Side Includes (see Chapter 5, Server Side Includes, for more information on Server Side Includes). These codes are actually parsed by a CGI program which uses the codes to maintain information across several documents. This algorithim is best illustrated via example.

Let's create a multiple survey form system. Here is the first form of the survey:

<HTML>
<HEAD><TITLE>Television/Movie Survey</TITLE></HEAD>
<BODY>
<H1>Welcome to the CGI Network!</H1>
<HR>
In order to better serve you, we would like to know what type of
movies and variety shows you like to watch on TV. Over the last couple
of years, you, the viewers, were directly responsible for the lasting
success of many of our shows. Your comments are extremely valuable to
us, so please take a few moments to fill out a survey.
<P>
The current time is: <!--#insert var="DATE_TIME"--><BR>

At first glance, the construct in the last line displayed above looks like a Server Side Include. However, it is not! This document first gets parsed by a CGI program that looks for statements like these and replaces them with appropriate information. Let's refer to these statements as CGI Side Includes ( CSIs), or "pseudo" Server Side Includes. In this case, the program will insert the current date and time.

You may ask, what is the advantage of such a process? It allows you to insert dynamic information in otherwise static documents. Another alternative to this would be to place the information contained within the document in the program, such as:

print <<End_of_Form;
<HTML>
<HEAD><TITLE>Sample Form</TITLE></HEAD>
<BODY>
<H1>This is a test of a sample form</H1>
The current time is: $date_time
<HR>
.
.
.
</BODY></HTML>
End_of_Form

As you can see, this can be quite cumbersome, especially if the document is large. Now, let's proceed with the rest of the form.

<HR>
<FORM ACTION="/cgi-bin/survey.pl?
                 cgi_cookie=<!--#insert var="COOKIE"-->&
                 cgi_form_num=<!--#insert var="NUMBER"-->" METHOD="POST">

As in other examples in this book, a query is passed to the program as part of the ACTION attribute. Notice the two CSI statements in the <FORM> tag. The first one inserts a random number--also referred to as a magic cookie--for identification purposes, and the second one inserts the form number. A cookie is needed to store the information from the various forms in a unique data file. This cookie is passed to each and every form, so that the form data is appended to the same data file. A form number is needed to keep track of the various forms. We will discuss these statements in detail later in this chapter.

<PRE>
Full Name: <INPUT TYPE="text" NAME="01 Full Name" SIZE=40>
E-Mail:    <INPUT TYPE="text" NAME="02 EMail Address" SIZE=40>

The field names are prefixed with numbers, so that they can be sorted. This makes it possible to store the form data in the order in which it is displayed in the form. Remember, you do not need to encode the field names, as the browser will do so before it submits the information to the server.

</PRE>
<P>
Which survey would you like to fill out: <BR>
<INPUT TYPE="radio" NAME="cgi_survey" VALUE="Television" CHECKED>Television<BR>
<INPUT TYPE="radio" NAME="cgi_survey" VALUE="Movie">Movies<BR>
<P>
<INPUT TYPE="submit" VALUE="Submit the survey">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>
<HR>
</BODY></HTML>

The document is passed to the CGI program as extra path information. For example, if you want the program to parse the CSI statements and display the form, the following URL should be used:

http://your.machine/survey.pl/start_survey.html

where the file "/start_survey.html" contains the first form of the survey. In the context of this example, if the user opts to fill out the "Television" survey, the following two forms are displayed, one after the other:

<HTML>
<HEAD><TITLE>Television/Movie Survey</TITLE></HEAD>
<BODY>
<H1>Television Survey</H1>
<HR>
Welcome! We are glad that you have decided to fill out our
television survey. Please read all questions carefully. When you are finished,
press the Submit button for Part 2 of the survey.
<P>
The current time is: <!--#insert var="DATE_TIME"--><BR>

The date and time are inserted into the form using CGI side includes.


<HR>
<FORM ACTION="/cgi-bin/survey.pl?cgi_cookie=<!--#insert var="COOKIE"-->&cgi_survey=<!--#insert var="SURVEY"-->&cgi_form_num=<!--#insert var="NUMBER"-->" METHOD="POST">

The variable "SURVEY" inserts the user-selected survey type, either "Television" or "Movie." The survey type is retrieved from the information submitted by the user in the first form. This ensures that the correct series of forms are displayed.

What is your favorite comedy show?
<BR>
<INPUT TYPE="radio" NAME="03 Comedy Show" VALUE="Single Web Dude">Single Web Dude<BR>
<INPUT TYPE="radio" NAME="03 Comedy Show" VALUE="Gateway Friends">Gateway Friends<BR>
<INPUT TYPE="radio" NAME="03 Comedy Show" VALUE="Mad About CGI" CHECKED>Mad About CGI<BR>
<INPUT TYPE="radio" NAME="03 Comedy Show" VALUE="Web Time">Web Time<BR>
<P>
Who is your favorite actor in a comedy show?
<BR>
<INPUT TYPE="radio" NAME="04 TV Comedian" VALUE="John Riser" CHECKED>John Riser<BR>
<INPUT TYPE="radio" NAME="04 TV Comedian" VALUE="Jake LeBlanc">Jake LeBlanc<BR>
<INPUT TYPE="radio" NAME="04 TV Comedian" VALUE="Mike Cosby">Mike Cosby<BR>
<INPUT TYPE="radio" NAME="04 TV Comedian" VALUE="Marc Allen">Marc Allen<BR>
<P>
<INPUT TYPE="submit" VALUE="Submit the survey">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>
<HR>
</BODY></HTML>

The field names are prefixed with numerical values. Notice the long, descriptive names for the field names and values. This allows us to simply retrieve the names and values, decode them, and print them out.

Now, here is the second, and final, form in the "Television" survey:

<HTML>
<HEAD><TITLE>Television/Movie Survey</TITLE></HEAD>
<BODY>
<H1>Televison Survey</H1>
<HR>
Thanks for filling out Part 1 of our TV survey. Here is
Part 2... Again, please read all questions carefully. When you are finished,
press the Submit button to wrap up the survey.
<P>
The current time is: <!--#insert var="DATE_TIME"--><BR>
<HR>
<FORM ACTION="/cgi-bin/survey.pl?cgi_cookie=<!--#insert var="COOKIE"-->&cgi_survey=<!--#insert var="SURVEY"-->&cgi_form_num=<!--#insert var="NUMBER"-->" METHOD="POST">
What is your favorite action/drama show?
<BR>
<INPUT TYPE="radio" NAME="05 TV Drama" VALUE="Masquerade on the Web">Masquerade on the Web<BR>
<INPUT TYPE="radio" NAME="05 TV Drama" VALUE="Gateway Voyager">Gateway Voyager<BR>
<INPUT TYPE="radio" NAME="05 TV Drama" VALUE="EH" CHECKED>EH - Emergency HTTP Server<BR>
<INPUT TYPE="radio" NAME="05 TV Drama" VALUE="W3C Hope">W3C Hope<BR>
<P>
Who is your favorite actor in an action/drama show?
<BR>
<INPUT TYPE="radio" NAME="06 TV Drama Actor" VALUE="Bill Wyle" CHECKED>Bill Wyle<BR>
<INPUT TYPE="radio" NAME="06 TV Drama Actor" VALUE="John Clooney">John Clooney<BR>
<INPUT TYPE="radio" NAME="06 TV Drama Actor" VALUE="Mike Strauss">Mike Strauss<BR>
<INPUT TYPE="radio" NAME="06 TV Drama Actor" VALUE="Eric Wagner">Eric Wagner<BR>
<P>
<INPUT TYPE="submit" VALUE="Submit the survey">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>
<HR>
</BODY></HTML>

The two forms for the "Movie" survey are set up in the same manner as the ones illustrated above. Let's look at the program:

#!/usr/local/bin/perl
$exclusive_lock = 2;
$unlock = 8;
$request_method = $ENV{'REQUEST_METHOD'};
$webmaster = "shishir\@bu\.edu";
$document_root = "/home/shishir/httpd_1.4.2/public";
$survey_dir = "/tmp/";

The variable survey_dir contains the directory where the data files are stored. Whenever you are creating temporary files, you should store them in /tmp or /var/tmp, as these directories are cleaned out every few days.

@Television_files = ( "/tv_1.html", "/tv_2.html" );
@Movie_files = ( "/movie_1.html", "/movie_2.html" );

These two arrays store the HTML survey files that must be parsed for CSI statements. The most important thing to note here is the way the variables are labeled. The first part of the variable name--before the "_" character--corresponds to the value of the cgi_survey field in the initial form. The program determines the survey type chosen by the user--either "Television" or "Movie"--and concatenates that string with "_files" and evaluates the total string at run-time to determine the next survey file.

if ($request_method eq "GET") {
    $form_num = 0;
    $type = "start";
    $form_file = $ENV{'PATH_INFO'};

Using the GET method indicates that the user requested the starting form, which will be stored in PATH_INFO. The form_num variable indicates the current form number. In this case, zero indicates the starting form.

The type variable is set to "start". However, this value is never used because there is no corresponding CSI in the initial form. It is just defined for clarity. Remember, the manner in which the starting form must be accessed is a GET request:

http://your.machine/cgi-bin/survey.pl/start_survey.html

After the first form is submitted, the server will execute this program with a POST request and an additional query. The process is repeated for all the forms in the survey.

    if ($form_file) {
        $cookie = join ("_", $ENV{'REMOTE_HOST'}, time);
        $cookie = &escape($cookie);
        &pseudo_ssi ($form_file, $cookie, $type, $form_num);
    } else {
        &return_error (500, "CGI Network Survey Error", 
                        "An initial survey form must be specified.");
    }

Since the starting form was accessed, a new cookie has to be created. This cookie is simply the client's host address concatenated with the current time. Perl's time command returns the current time as the number of seconds since 1970. This ensures that every user has a different cookie.

The escape subroutine encodes the cookie string for insertion into the form. Finally, the pseudo_ssi subroutine reads and parses the file specified by the variable form_file for CSI statements. The three parameters that are passed to the subroutine are the new cookie, the dummy form type, and the form number. If corresponding CSI statements are found, the values stored in these variables will be inserted appropriately.

} elsif ($request_method eq "POST") {
    &parse_form_data(*STATE);
    $form_num = $STATE{'cgi_form_num'};
    $type = $STATE{'cgi_survey'};
    $cookie = $STATE{'cgi_cookie'};

The form information is retrieved and stored in the STATE associative array. The parse_form_data subroutine is slightly different than the one used in the previous examples; it decodes the form field name, as well as the value.

Once the initial form is submitted, form_num variable equals zero, type contains either "Television" or "Movie," and cookie holds a string that uniquely identifies a user. After the initial form, all the other forms will have the same cookie and type information. However, the form_num variable will be incremented.

    if ( ($type eq "Television") || ($type eq "Movie") ) {

This conditional is executed if the user chose to fill out either a television or movie survey. Since one of the values is checked by default on the form, this variable will have to contain either "Television" or "Movie." However, if someone accesses this program by bypassing the starting form, and specifies something other than these two values, an error message is displayed.

        $limit = eval ("scalar (\@${type}_files)");

This run-time evaluation is very important. It uses Perl's scalar function to determine the number of elements in the array that corresponds to the value stored in the variable type. Here is a simple example of scalar :

@test = (1, 2, 3);
$number = scalar (@test);

The variable number returns 3 to indicate the existence of three elements.

             if ( ($form_num >= 0) && ($form_num <= $limit) ) {
            &write_data_to_file();

If the form number is within the limits, the write_data_file subroutine is called to write the form information to a data file. Remember, the same data file is used throughout the whole process. On the other hand, if a user bypasses the forms, and tries to pass a form number that is not within the limits, an error message is displayed.

            if ($form_num == $limit) {
                &survey_over();

If the form is the last one in the survey, the survey_over subroutine is called to display the information stored in the data file. It also deletes the data file.

            } else {
                $form_file = eval("\$${type}_files[$form_num]");
                $form_num++;
                $cookie = &escape($cookie);
                &pseudo_ssi ($form_file, $cookie, $type, 
                             $form_num);
            }

Again, a run-time evaluation is performed to retrieve the name of the next file in the survey. If these two run-time evals were not used, then two separate blocks of code have to be written: one to handle the television survey, and the other to handle the movie survey. It is more much efficient to do it this way.

The form number is incremented, and the cookie value is encoded. The subroutine pseudo_ssi is called to parse the form file.

        } else {
                &return_error (500, "CGI Network Survey Error",
                    "You have somehow selected an invalid form!");
        }
    } else {
        &return_error (500, "CGI Network Survey Error",
                "You have selected an invalid survey type!");
    }
} else {
    &return_error (500, "Server Error",
                        "Server uses unsupported method");
}
exit(0);

If the user somehow passed invalid information to the program, error messages are returned.

Now for the subroutines. The pseudo_ssi subroutine parses the CSI statements.

sub pseudo_ssi
{
    local ($file, $id, $kind, $number) = @_;
    local ($command, $argument, $parameter, $line);
    $file = $document_root . $file;
    open (FILE, "<" . $file) ||
        &return_error (500, "CGI Network Survey Error",
            "Cannot open: form [$number], file [$file].");
    flock (FILE, $exclusive_lock);

The subroutine tries to open the specified file. An error message is returned if the operation fails.

    print "Content-type: text/html", "\n\n";
    while (<FILE>) {
        while ( ($command, $argument, $parameter) = 
            (/<!--\s*#\s*(\w+)\s+(\w+)\s*=\s*"?(\w+)"?\s*-->/io) ) {

The initial loop iterates through each line in the file, and stores it in the default variable $_. The second loop uses a regular expression to check for a CSI statement within the file. Here is the format for the CSI statement:

<!--#command argument="parameter"-->

Whitespace is ignored, and the quotation marks around the parameter are optional. This is in great contrast to SSI statements, where a strict format is enforced.

            if ($command eq "insert") {
                if ($argument eq "var") {
                    if ($parameter eq "COOKIE") {
                        s//$id/;          
                    } elsif ($parameter eq "DATE_TIME") {
                        local ($time) = &get_date_time();
                        s//$time/;          
                    } elsif ($parameter eq "NUMBER") {
                        s//$number/;
                    } elsif ($parameter eq "SURVEY") {
                        s//$kind/;
                    } else {
                        s///;
                    }
                } else {
                    s///;
                }
            } else {
                s///;
            }
        }
    
        print;
    }

This block might look very confusing, but it is quite simple. This program only supports the insert command and the var argument. However, four parameters are allowed: COOKIE, DATE_TIME, NUMBER, and SURVEY.

Notice the strange substitute command. The initial string to substitute is not specified. Usually, the format of the substitute command looks like this:

s/initial/replacement/;

Perl will work on the default variable $_. However, if no initial string is specified, Perl automatically uses the last matched regular expression. This just so happens to be the CSI statement that matched earlier. This is a good trick in Perl, because it is very efficient.

The subroutine simply checks to see the parameter of the CSI, and replaces the information appropriately. The get_date_time subroutine is the same as the one used previously. If the command, argument, or parameter specified in the file does not match the ones listed, the substitute command is used to remove the CSI statement. Note the following format:

s///;

Perl replaces the last matched regular expression with a null string. It is very important to remove these unmatched CSI statements, or else the enclosing while loop will run forever. The reason for this is that the loop repeatedly checks for CSI statements.

Finally, the modified line is output. A print command without any parameters outputs the default variable $_.

    flock (FILE, $unlock);
    close (FILE);
}

Before we quit the subroutine, the file is unlocked and closed.

The write_data_to_file subroutine opens the data file and incorporates the survey results into it.

sub write_data_to_file
{
    local ($key, $temp_key);
    open (FILE, ">>" . $survey_dir . $cookie) || 
                    &return_error (500, "CGI Network Survey Error",
                        "Cannot write to a data file to store your info.");
    if ($form_num == 0) {
        print FILE $STATE{'cgi_survey'}, " Survey Filled Out", "\n";
    }

The data file is opened in append mode. There is no need to lock the file, because every user has a unique filename. If the form number indicates that it is the initial form, a header is output.

    foreach $key (sort (keys %STATE)) {

Let's look at this construct from the innermost parentheses. The keys command returns an array consisting of all the keys of the associative array. The sort function then sorts that array. And foreach iterates through this array, storing each element in key.

Information in an associative array is not stored in any order, because it is based on a string index. As a result, the keys command returns the information in a random order. Prefixing numerical values to the form field names allows us to sort the information returned by the keys command.

        if ($key !~ /^cgi_/) {

If the key name begins with "cgi_", it is omitted. Internally used variables are prefixed with "cgi_" to keep them separate from real form data.

            ($temp_key = $key) =~ s/^\d+\s//;

This regular expression is used to remove the numerical value from the key. The modified key is stored in temp_key. The field names in the form were in the format:

"01 Variable Name"

We use the regular expression to search for a string that starts with a numeric value followed by a space.

            print FILE $temp_key, ": ", $STATE{$key}, "\n";
        }
    }
    close (FILE);
}

The new key, along with the form value, is displayed. If the form contained a scrolling list that allowed the user to make multiple selections, then all of the values are stored in one string, separated by the null character, "\0". This subroutine does not perform any formatting on such a string. However, the next ordering system example shows how to split and display these values separately.

Note that the associative array is still indexed by the "old" key. The new key was defined just for output purposes. Finally, the file is closed.

The survey_over subroutine thanks the user and prints his or her responses.

sub survey_over
{
    local ($file) = $survey_dir . $cookie;
    open (FILE, "<" . $file) || 
                &return_error (500, "CGI Network Survey Error",
                                 "Cannot read the survey data file [$file].");
    print <<Thanks;
Content-type: text/html
<HTML>
<HEAD><TITLE>Thank You!</TITLE></HEAD>
<BODY>
<H1>Thank You!</H1>
Thank you again for filling out our survey. Here is the information
that you selected: 
<HR>
<P>
Thanks
    while (<FILE>) {
        print $_, "<BR>";
    }
    print "<HR>";
    print "</BODY></HTML>", "\n";
    close (FILE);
    unlink ($file);
}

The file is opened in read mode, and the information contained in it is displayed to standard output. Finally, the unlink command deletes the file.

The escape subroutine encodes the data. The code is very similar to the program presented at the beginning of this book.

sub escape
{
    local ($string) = @_;
    $string =~ s/(\W)/sprintf("%%%x", ord($1))/eg;
    return($string);
}

Finally, the parse_form_data subroutine parses the form field name as well as the form data. That is the only difference between this version of the subroutine and the one presented in the earlier examples.

sub parse_form_data
{
    local (*FORM_DATA) = @_;
    
    local ($query_string, @key_value_pairs, $key_value, $key, $value);
    
    read (STDIN, $query_string, $ENV{'CONTENT_LENGTH'});
    if ($ENV{'QUERY_STRING'}) {
            $query_string = join("&", $query_string, $ENV{'QUERY_STRING'});
    }     
    @key_value_pairs = split (/&/, $query_string);
    foreach $key_value (@key_value_pairs) {
        ($key, $value) = split (/=/, $key_value);
        $key   =~ tr/+/ /; 
        $value =~ tr/+/ /;
            
        $key   =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex ($1))/eg;
        $value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex ($1))/eg;
        if (defined($FORM_DATA{$key})) {
            $FORM_DATA{$key} = join ("\0", $FORM_DATA{$key}, $value);
        } else {
            $FORM_DATA{$key} = $value;
        }
    }
}

There are other ways to accomplish an ordering or "shopping cart" system like the one illustrated above. However, this is one of the best ways. The only drawback to this approach involves the temporary files that are created.

If a user decides to exit midway through the survey, the temporary file will not be deleted, because there is no way to determine when the user leaves. The only solution to this problem is to manually delete files based on modification times. See Chapter 9, Gateways, Databases, and Search/Index Utilities, for an ordering system that works by communicating with another network server, specially designed to store and distribute information.

CSI Statements and Hidden Fields

The hidden field technique we described earlier allows us to modify the ordering system presented earlier in two ways. The first is to replace the query information in the ACTION attribute of the <FORM> tag with hidden fields. Let's look at the starting form again:

<HTML>
<HEAD><TITLE>Television/Movie Survey</TITLE></HEAD>
<BODY>
<H1>Welcome to the CGI Network!</H1>
<HR>
In order to better serve you, we would like to know what type of
movies and variety shows you like to watch on TV. Over the last couple
of years, you, the viewers, were directly responsible for the lasting
success of many of our shows. Your comments are extremely valuable to
us, so please take a few moments to fill out a survey.
<P>
The current time is: <!--#insert var="DATE_TIME"--><BR>

If we want the current time to be displayed in the form, we need to keep this statement.

<HR>
<FORM ACTION="/cgi-bin/survey.pl?cgi_cookie=<!--#insert var="COOKIE"-->&cgi_form_num=" METHOD="POST">

This can be modified to:

<FORM ACTION="/cgi-bin/survey.pl" METHOD="POST">
<INPUT TYPE="hidden" NAME="cgi_cookie" VALUE="<!--#insert var="COOKIE"-->"
<INPUT TYPE="hidden" NAME="cgi_form_num" VALUE="<!--#insert var="NUMBER"-->"

The program described above will replace the CSI statements with appropriate information.

<PRE>
Full Name: <INPUT TYPE="text" NAME="01 Full Name" SIZE=40>
E-Mail:    <INPUT TYPE="text" NAME="02 EMail Address" SIZE=40>
</PRE>
<P>
Which survey would you like to fill out: <BR>
<INPUT TYPE="radio" NAME="cgi_survey" VALUE="Television" CHECKED>Television<BR>
<INPUT TYPE="radio" NAME="cgi_survey" VALUE="Movie">Movies<BR>
<P>
<INPUT TYPE="submit" VALUE="Submit the survey">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>
<HR>
</BODY></HTML>

There is really no advantage to using this technique over the original one, as the two are nearly identical. If you use this method, you can remove the following line from the parse_form_data subroutine:

    if ($ENV{'QUERY_STRING'}) {
            $query_string = join("&", $query_string, $ENV{'QUERY_STRING'});
    }     

There is no need to store any query information.


Previous Home Next
Hidden Fields Book Index Netscape Persistent Cookies

Back to: CGI Programming on the World Wide Web


oreilly.com Home | O'Reilly Bookstores | How to Order | O'Reilly Contacts
International | About O'Reilly | Affiliated Companies | Privacy Policy

© 2001, O'Reilly & Associates, Inc.