4. Forms and CGI

As we discussed briefly in Chapter 4, Forms and CGI forms are generally used for two purposes: data collection and interactive communication. You can conduct surveys or polls, and present registration or online ordering information through the use of forms. They are also used to create an interactive medium between the user and the Web server. For example, a form can ask the user to select a document out of a menu, whereby the server returns the chosen document.

The main advantage of forms is that you can use them to create a front end for numerous gateways (such as databases or other information servers) that can be accessed by any client without worrying about platform dependency. On the other hand, there are some shortcomings with the current implementation:

  • The interface does not support any data types besides the general “text” type. The next HTML specification could contain other data types, such as “int,” “date,” “float,” and “url.”
  • User input cannot be checked on the client side; the user has to press the Submit button and the CGI program on the server side has to make sure the input is valid.

This chapter covers:

  • The HTML tags for writing forms
  • How form data is sent to the server
  • Examples of designing form-based CGI applications, both in Perl and other languages

4.1 HTML Tags

A form consists of two distinct parts: the HTML code and the CGI program. HTML tags create the visual representation of the form, while the CGI program decodes (or processes) the information contained within the form. Before we look at how CGI programs process form information, let's understand how a form is created. In this section, we'll cover the form tags and show examples of their use.

The FORM Tag

Here is the beginning of a simple form:

<FORM ACTION="/cgi-bin/program.pl" METHOD="POST">

The <FORM> tag starts the form. A document can consist of multiple forms, but forms cannot be nested; a form cannot be placed inside another form.

The two attributes within the <FORM> tag ( ACTION and METHOD) are very important. The ACTION attribute specifies the URL of the CGI program that will process the form information. You are not limited to using a CGI program on your server to decode form information; you can specify a URL of a remote host if a program that does what you want is available elsewhere.

The METHOD attribute specifies how the server will send the form information to the program. POST sends the data through standard input, while GET passes the information through environment variables. If no method is specified, the server defaults to GET. Both methods have their own advantages and disadvantages, which will be covered in detail later in the chapter.

In addition, another attribute, ENCTYPE, can be specified. This represents the MIME type (or encoding scheme) for the POST data, since the information is sent to the program as a data stream. Currently, only two ENCTYPES are allowed: application/x-www-form-urlencoded and multipart/form-data. If one is not specified, the browser defaults to application/x-www-form-urlencoded. Appendix D, CGI Lite, shows an example of using multipart/form-data, while this chapter is devoted to application/x-www-form-urlencoded.

Text and Password Fields

Most form elements are implemented using the <INPUT> tag. The TYPE attribute to <INPUT> determines what type of input is being requested. Several different types of elements are available: text and password fields, radio buttons, and checkboxes. The following lines are examples of simple text input.

Name: <INPUT TYPE="text" NAME="user" SIZE=40><BR>
Age: <INPUT TYPE="text" NAME="age"  SIZE=3 MAXLENGTH=3><BR>
Password: <INPUT TYPE="password" NAME="pass" SIZE=10><BR>

In this case, two text fields and one password field are created using the “text” and “password” arguments, respectively. The password field is basically the same as a text field except the characters entered will be displayed as asterisks or bullets. If you skip the TYPE attribute, a text field will be created by default.

The NAME attribute defines the name of the particular input element. It is not displayed by the browser, but is used to label the data when transferred to the CGI program. For example, the first input field has a NAME=“user” attribute. If someone types “andy” into the first input field, then part of the data sent by the browser will read:

user=andy

The CGI program can later retrieve this information (as we talked about briefly in Chapter 2, Input to the Common Gateway Interface, and will discuss in more detail later in this chapter) and parse it as needed.

The optional VALUE attribute can be used to insert an initial “default” value into the field. This string can be overwritten by the user.

Other optional attributes are SIZE and MAXLENGTH. SIZE is the physical size of the input element; the field will scroll if the input exceeds the size. The default size is 20 characters. MAXLENGTH defines the maximum number of characters that will be accepted by the browser; by default there is no limit.

In the following line, the initial text field size is expanded to 40 characters, the maximum length is specified as 40 as well (so the field will not scroll), and the initial value string is “Shishir Gundavaram.”

<INPUT TYPE="text"  NAME="user"  SIZE=40 MAXLENGTH=40 VALUE="Shishir Gundavaram"  >

Before we move on, there is still another type of text field. It is called a “hidden” field and allows you to store information in the form. The client will not display the field. For example:

<INPUT TYPE="hidden" NAME="publisher" VALUE="ORA">

Hidden fields are most useful for transferring information from one CGI application to another. See Chapter 8, Multiple Form Interaction, for an example of using hidden fields.

Submit and Reset Buttons

Two more important “types” of the <INPUT> tag are Submit and Reset.

<INPUT TYPE="submit" VALUE="Submit the form">
<INPUT TYPE="reset"  VALUE="Clear all fields">

Nearly all forms offer Submit and Reset buttons. The Submit button sends all of the form information to the CGI program specified by the ACTION attribute. Without this button, the form will be useless since it will never reach the CGI program.

Browsers supply a default label on Submit and Reset buttons (generally, the unimaginative labels “Submit” and “Reset,” of course). However, you can override the default labels using the VALUE attribute.

You can have multiple Submit buttons:

<INPUT TYPE="submit" NAME="option" VALUE="Option 1">
<INPUT TYPE="submit" NAME="option" VALUE="Option 2">

If the user clicked on “Option 1”, the CGI program would get the following data:

option=Option 1

You can also have images as buttons:

<INPUT TYPE="image" SRC="/icons/button.gif" NAME="install"
            VALUE="Install Program">

When you click on an image button, the browser will send the coordinates of the click:

install.x=250&install.y=20

Note that each field information is delimited by the “&” character. We will discuss this in detail later in the chapter. On the other hand, if you are using a text browser, and you select this button, the browser will send the following data:

install=Install Program

The Reset button clears all the information entered by the user. Users can press Reset if they want to erase all their entries and start all over again.

Figure 4.1 shows how the form will look in Netscape Navigator.

Figure 4.1: Form with text input fields

images

Radio Buttons and Checkboxes

Radio buttons and checkboxes are typically used to present the user with several options.

A checkbox creates square buttons (or boxes) that can be toggled on or off. In the example below, it is used to create four square checkboxes.

<FORM ACTION="/cgi-bin/program.pl" METHOD="POST">
Which movies do you want to order: <BR>
Amadeus <INPUT TYPE="checkbox" NAME="amadeus">
The Last Emperor <INPUT TYPE="checkbox" NAME="emperor">
Gandhi <INPUT TYPE="checkbox" NAME="gandhi">
Schindler's List <INPUT TYPE="checkbox" NAME="schindler">
<BR>

If a user toggles a checkbox “on” and then submits the form, the browser uses the value “on” for that variable name. For example, if someone clicks on the “Gandhi” box in the above example, the browser will send:

gandhi=on

You can override the value “on” using the VALUE attribute:

Gandhi <INPUT TYPE="checkbox" NAME="gandhi" VALUE="yes">

Now when the “Gandhi” checkbox is checked, the browser will send:

gandhi=yes

One checkbox is not related to another. Any number of them can be checked at the same time. A radio button differs from a checkbox in that only one radio button can be enabled at a time. For example:

How do you want to pay for this product: <BR>
Master Card: <INPUT TYPE="radio" NAME="payment" VALUE="MC" CHECKED><BR>
Visa: <INPUT TYPE="radio" NAME="payment" VALUE="Visa"><BR>
American Express: <INPUT TYPE="radio" NAME="payment" VALUE="AMEX"><BR>
Discover: <INPUT TYPE="radio" NAME="payment" VALUE="Discover"><BR>
</FORM>

Here are a few guidelines for making a radio button work properly:

  • All options must have the same NAME (in this example, “payment”). This is how the browser knows that they should be grouped together, and can therefore ensure that only one radio button using the same NAME can be selected at a time.
  • Whereas with checkboxes supplying a different VALUE is only a matter of taste, with radio buttons different VALUEs are crucial to getting meaningful results. Without a specified VALUE, no matter which item is checked, the browser will assign the string “on” to the “payment” NAME variable. The CGI program therefore has no way to know which item was actually checked. So each item in a radio button needs to be assigned a different VALUE to make sure that the CGI program knows which one was selected.

For both radio buttons and checkboxes, the CHECKED attribute determines whether the item should be enabled by default. In the radio button example, the “Master Card” option is given a CHECKED value, effectively making it the default value.

Figure 4.2 shows how this example will be rendered by the browser.

Figure 4.2: Form with radio buttons and checkboxes

images

Menus and Scrolled Lists

Menus and scrolled lists are generally used to present a large number of options or choices to the user. The following is an example of a menu:

<FORM ACTION="/cgi-bin/program.pl" METHOD="POST">
Choose a method of payment:
<SELECT NAME="card" SIZE=1>
<OPTION SELECTED>Master Card
<OPTION>Visa
<OPTION>American Express
<OPTION>Discover
</SELECT>

Option menus and scrolled lists are created using the SELECT tag, which has an opening and a closing tag. The SIZE attribute determines if a menu or a list is displayed. A value of 1 produces a menu, and a value greater than 2 produces a scrolled list, in which case the number represents the number of items that will be visible at one time.

A selection in a menu or scrolled list is added using the OPTION tag. The SELECTED attribute to OPTION allows you to set a default selection.

Now for an example of a scrolled list (a list with a scrollbar):

<SELECT NAME="books" SIZE=3 MULTIPLE>
<OPTION SELECTED>TCP/IP Network Administration
<OPTION>Linux Network Administrators Guide
<OPTION>DNS and BIND
<OPTION>Computer Security Basics
<OPTION>System Performance Tuning
</SELECT>
</FORM>

The example above creates a scrolled list with three visible items and the ability to select multiple options. (The MULTIPLE attribute specifies that more than one item can be selected.)

Figure 4.3 shows what the menus and scrolled list look like.

Figure 4.3: Form with menus and scrolled lists

images

Multiline Text Fields

You must have seen numerous guestbooks on the Web that ask for your comments or opinions, where you can enter a lot of information. This is accomplished by using a multiline text field. Here is an example:

<FORM ACTION="/cgi-bin/program.pl" METHOD="POST">
<TEXTAREA ROWS=10 COLS=40 NAME="comments">
</TEXTAREA>

This creates a scrolled text field with 10 rows and 40 columns. (10 rows and 40 columns designates only the visible text area; the text area will scroll if the user types further).

Notice that you need both the beginning <TEXTAREA> and the ending </TEXTAREA> tags. You can enter default information between these tags.

<TEXTAREA ROWS=10 COLS=40 NAME="comments_2">
This is some default information.
Some more...
And some more...
</TEXTAREA>
</FORM>

You have to remember that newlines (or carriage returns) are not ignored in this field--unlike HTML. In the preceding example, the three separate lines will be displayed just as you typed them.

The multiline examples will be rendered by the browser as shown in Figure 4.4.

Figure 4.4: Form with multiline text input

images

Quick Reference to Form Tags

Before we get going, here's a short list of all the available form tags:

Table 4.1: Form Tags

Form Tag Description
<FORM ACTION=“/cgi-bin/prog.pl” METHOD=“POST”> Start the form
<INPUT TYPE=“text” NAME=“name” VALUE=“value” SIZE=“size”> Text field
<INPUT TYPE=“password” NAME=“value” VALUE=“value” SIZE=“size”> Password field
<INPUT TYPE=“hidden” NAME=“name” VALUE=“value”> Hidden field
<INPUT TYPE=“checkbox” NAME=“name” VALUE=“value”> Checkbox
<INPUT TYPE=“radio” NAME=“name” VALUE=“value”> Radio button
<SELECT NAME=“name” SIZE=1> <OPTION SELECTED>One <OPTION>Two : </SELECT> Menu
<SELECT NAME=“name” SIZE=n MULTIPLE> Scrolled list
<TEXTAREA ROWS=yy COLS=xx NAME=“name”> . . </TEXTAREA> Multiline text fields
<INPUT TYPE=“submit” VALUE=“Message!”> <INPUT TYPE=“submit” NAME=“name” VALUE=“value”> <INPUT TYPE=“image” SRC=“/image” NAME=“name” VALUE=“value”> Submit buttons
<INPUT TYPE=“reset” VALUE=“Message!”> Reset button
</FORM> Ends form

4.2 Sending Data to the Server

Earlier in this chapter we mentioned the application/x-www-form-urlencoded MIME type. The browser uses this MIME type to encode the form data.

First, each form element's name--specified by the NAME attribute--is equated with the value entered by the user to create a key-value pair. For example, if the user entered “30” when asked for the age, the key-value pair would be (age=30). Each key-value pair is separated by the “&” character.

Second, since the variable names for the form element and the actual form data are standard text, it is possible this text could consist of characters that will confuse browsers. To prevent possible errors, the encoding scheme translates all “special” characters to their corresponding hexadecimal codes. These “special” characters include control characters and certain alphanumeric symbols. For example, the string “Thanks for the help!” would be converted to “Thanks%20for%20the%20help%21”. This process is repeated for each key-value pair to create a query string.[1]

For text and password fields, the user input will represent the value. If no information was entered, the key-value pair will be sent anyway, with the value left blank (i.e., “name=“).

For radio buttons and checkboxes, the VALUE attribute represents the value when the button element is checked. If no VALUE is specified, the value defaults to “on.” An unchecked checkbox will not be sent as a key-value pair; it will be ignored.

The CGI program then has to “decode” this information in order to access the form data. The encoding scheme is the same for both GET and POST.

GET vs. POST

There are two methods for sending form data: GET and POST. The main difference between these methods is the way in which the form data is passed to the CGI program. If the GET method is used, the query string is simply appended to the URL of the program when the client issues the request to the server. This query string can then be accessed by using the environment variable QUERY_STRING. Here is a sample GET request by the client, which corresponds to the first form example:

GET /cgi-bin/program.pl?user=Larry%20Bird&age=35&pass=testing HTTP/1.0
Accept: www/source
Accept: text/html
Accept: text/plain
User-Agent: Lynx/2.4 libwww/2.14

As we discussed in Chapter 2, the query string is appended to the URL after the “?” character.[2] The server then takes this string and assigns it to the environment variable QUERY_STRING.

The GET method has both advantages and disadvantages. The main advantage is that you can access the CGI program with a query without using a form. In other words, you can create “canned queries.” Basically, you are passing parameters to the program. For example, if you want to send the previous query to the program directly, you can do this:

<A HREF="/cgi-bin/program.pl?user=Larry%20Bird&age=35&pass=testing">CGI Program</A>

Here is a simple program that will aid you in encoding data:

#!/usr/local/bin/perl
print "Please enter a string to encode: ";
$string = <STDIN>;
chop ($string);
$string =~ s/(\W)/sprintf("%%%x", ord($1))/eg;
print "The encoded string is: ", "\n";
print $string, "\n";
exit(0);

This is not a CGI program; it is meant to be run from the shell. When you run the program, the program will prompt you for a string to encode. The <STDIN> operator reads one line from standard input. It is similar to the <FILEHANDLE> construct we have been using. The chop command removes the trailing newline character (“\n”) from the input string. Finally, the user-specified string is converted to a hexadecimal value with the sprintf command, and printed out to standard output.

A query is one method of passing information to a CGI program via the URL. The other method involves sending extra path information to the program. Here is an example:

<A HREF="/cgi-bin/program.pl/user=Larry%20Bird/age=35/pass=testing>CGI Program</A>

The string “/user=Larry%20Bird/age=35/pass=testing” will be placed in the environment variable PATH_INFO when the request gets to the CGI program. This method of passing information to the CGI program is generally used to provide file information, rather than form data. The NCSA imagemap program works in this manner by passing the filename of the selected image as extra path information.

If you use the “question-mark” method or the pathname method to pass data to the program, you have to be careful, as the browser or the server may truncate data that exceeds an arbitrary number of characters.

Now, here is a sample POST request:

POST /cgi-bin/program.pl HTTP/1.0
Accept: www/source
Accept: text/html
Accept: text/plain
User-Agent: Lynx/2.4 libwww/2.14
Content-type: application/x-www-form-urlencoded
Content-length: 35
user=Larry%20Bird&age=35&pass=testing

The main advantage to the POST method is that query length can be unlimited-- you don't have to worry about the client or server truncating data. To get data sent by the POST method, the CGI program reads from standard input. However, you cannot create “canned queries.”

Understanding the Decoding Process

In order to access the information contained within the form, a decoding protocol must be applied to the data. First, the program must determine how the data was passed by the client. This can be done by examining the value in the environment variable REQUEST_METHOD. If the value indicates a GET request, either the query string or the extra path information must be obtained from the environment variables. On the other hand, if it is a POST request, the number of bytes specified by the CONTENT_LENGTH environment variable must be read from standard input. The algorithm for decoding form data follows:

  1. Determine request protocol (either GET or POST) by checking the REQUEST_METHOD environment variable.
  2. If the protocol is GET, read the query string from QUERY_STRING and/or the extra path information from PATH_INFO.
  3. If the protocol is POST, determine the size of the request using CONTENT_LENGTH and read that amount of data from the standard input.
  4. Split the query string on the “&” character, which separates key-value pairs (the format is key=value&key=value...).
  5. Decode the hexadecimal and “+” characters in each key-value pair.
  6. Create a key-value table with the key as the index. (If this sounds complicated, don't worry, just use a high-level language like Perl. The language makes it pretty easy.)

You might wonder why a program needs to check the request protocol, when you know exactly what type of request the form is sending. The reason is that by designing the program in this manner, you can use one module that takes care of both types of requests. It can also be beneficial in another way.

Say you have a form that sends a POST request, and a program that decodes both GET and POST requests. Suppose you know that there are three fields: user, age, and pass. You can fill out the form, and the client will send the information as a POST request. However, you can also send the information as a query string because the program can handle both types of requests; this means that you can save the step of filling out the form. You can even save the complete request as a hotlist item, or as a link on another page.

4.3 Designing Applications Using Forms in Perl

Here is a simple form that prompts for a name:

<HTML>
<HEAD><TITLE>Testing a Form</TITLE></HEAD>
<BODY>
<H1>Testing a Form</H1>
<HR>
<FORM ACTION="/cgi-bin/greeting.pl" METHOD="POST">
Enter your full name: <INPUT TYPE="text" NAME="user" SIZE=60><BR>
<P>
<INPUT TYPE="submit" VALUE="Submit the form">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>
<HR>
</BODY>
</HTML>

The form consists of an input field and the Submit and Reset buttons.

Now, here is the Perl program to decode the information and print a greeting:

#!/usr/local/bin/perl
$webmaster = "shishir\@bu.edu";
&parse_form_data (*simple_form);

The subroutine parse_form_data decodes the form information. Here, the main program passes the subroutine a reference to a variable named simple_form. The subroutine treats it as an associative array (a common data type in Perl) and fills it with key-value pairs sent by the browser. We will see how parse_form_data works later; the important thing right now is that we can easily get the name of the user entered into the form.

You may find it confusing, trying to track what happens to the information entered by the user. The user fills out the forms, and the browser encodes the information into a string of key-value pairs. If the request method is POST, the server passes the information as standard input to the CGI program. If the request method is GET, the server stores the information in an environment variable, QUERY_STRING. In either case, parse_form_data retrieves the data, breaks it into key-value pairs, and stores it into an associative array. The main program can then extract any information that you want.

print "Content-type: text/plain", "\n\n";
$user = $simple_form{'user'};
if ($user) {
    print "Nice to meet you ", $simple_form{'user'}, ".", "\n";
    print "Please visit this Web server again!", "\n";
} else {
    print "You did not enter a name. Are you shy?", "\n";
    print "But, you are welcome to visit this Web server again!", "\n";
}
exit(0);

The main program now extracts the user name from the array that parse_form_data filled in. If you go back and look at the form, you'll find it contained an <INPUT> tag with a NAME attribute of “user.” The value “user” becomes the key in the array. That is why this program checks for the key “user” and extracts the value, storing it in a variable that also happens to be named “user.”

The conditional checks to see if the user entered any information. One of two possible greetings is printed out. It is always very important to check the form values to make sure there is no erroneous information. For example, if the user entered “John Doe” the output would be:

Nice to meet you John Doe.
Please visit this Web server again!

On the other hand, if the user did not enter any data into the input field, the response would be:

You did not enter a name. Are you shy?
But, you are welcome to visit this Web server again!

Now, let's look at the core of this program: the subroutine that does all of the work.

sub parse_form_data
{
    local (*FORM_DATA) = @_;
    local ( $request_method, $query_string, @key_value_pairs,
                  $key_value, $key, $value);

The local variable FORM_DATA is a reference (or, in Perl terms, a glob) to the argument passed to the subroutine. In our case, FORM_DATA is a reference to the simple_form associate array. Why did we pass a reference with an asterisk (*simple_form) instead of just naming the array (simple_form)? The reasoning will be a little hard to follow if you are not familiar with programming, but I will try to explain. If I passed simple_form without the asterisk, the subroutine would not be able to pass information back to the main program in that array (it could return it in another array, but that is a different matter). This would be pretty silly, since the array is empty to start with and the only purpose of the subroutine is to fill it.

As you can see, the first thing I do is create another reference to the array, FORM_DATA. This means that FORM_DATA and simple_form share the same memory, and any data I put in FORM_DATA can be extracted by the main program from simple_form. You will see that the subroutine does all further operations on FORM_DATA; this is the same as doing them on simple_form.

Now let's continue with the rest of this subroutine.

$request_method = $ENV{'REQUEST_METHOD'};
    if ($request_method eq "GET") {
        $query_string = $ENV{'QUERY_STRING'};
    } elsif ($request_method eq "POST") {
        read (STDIN, $query_string, $ENV{'CONTENT_LENGTH'});
    } else {
        &return_error (500, "Server Error",
                            "Server uses unsupported method");
    }

The request method is obtained. If it is a GET request, the query string is obtained from the environment variable and stored in query_string. However, if it is a POST request, the amount of data sent by the client is read from STDIN with the read command and stored in query_string. If the request protocol is not one of the two discussed earlier, an error is returned. Notice the return_error subroutine, which is used to return an error to the browser. The three parameters represent the status code, the status keyword, and the error message, respectively.

@key_value_pairs = split (/&/, $query_string);
    foreach $key_value (@key_value_pairs) {
        ($key, $value) = split (/=/, $key_value);
        $value =~ tr/+/ /;
        $value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex ($1))/eg;

Since the client puts ampersands between key-value pairs, the split command specifies an ampersand as the delimiter. The result is to fill the array key_value_pairs with entries, where each key-value pair is stored in a separate array element. In the loop, each key-value pair is again split into a separate key and value, where an equal sign is the delimiter. The tr (for translate) operator replaces each “+” with the space character. The regular expression within the (for substitute) operator looks for an expression that starts with the “%” sign and is followed by two characters. These characters represent the hexadecimal value. The parentheses in the regexp instruct Perl to store these characters in a variable ($1). The pack and hex commands convert the value stored in $1 to an ASCII equivalent. Finally, the “e” option evaluates the second part of the substitute command--the replacement string--as an expression, and the “g” option replaces all occurrences of the hexadecimal string. If you had remained unconvinced up to now of Perl's power as a language for CGI, this display of text processing (similar to what thousands of CGI programmers do every day) should change your mind.

if (defined($FORM_DATA{$key})) {
            $FORM_DATA{$key} = join ("\0", $FORM_DATA{$key}, $value);
        } else {
                    $FORM_DATA{$key} = $value;
        }
    }
}

When multiple values are selected in a scrolled list and submitted, each value will contain the same variable name. For example, if you choose “One” and “Two” in a scrolled list with the variable name “Numbers,” the query string would look like:

Numbers=One&Numbers=Two

The conditional statement above is used in cases like these. If a variable name exists--indicating a scrolled list with multiple options--each value is concatenated with the “\0” separator. Now, here is the return_error subroutine:

sub return_error
{
    local ($status, $keyword, $message) = @_;
    print "Content-type: text/html", "\n";
    print "Status: ", $status, " ", $keyword, "\n\n";
    print <<End_of_Error;
<HTML>
<HEAD>
    <TITLE>CGI Program - Unexpected Error</TITLE>
</HEAD>
<BODY>
<H1>$keyword</H1>
<HR>$message<HR>
Please contact $webmaster for more information.
</BODY>
</HTML>
End_of_Error
    exit(1);
}

This subroutine can be used to return an error status. Since the program handles both GET and POST queries, you can send a query to it directly:

<A HREF="/cgi-bin/program.pl?user=John+Doe">Hello</A>

The program will display the same output as before.

Combining Graphics and Queries

It's simple to return graphical output when you process a form--in fact you can “bundle” the whole program up in an image, using the HTML tag IMG. Let's see how to do this. First, we'll start with a form that's just a little more complicated than the previous form:

<HTML>
<HEAD><TITLE>Color Text</TITLE></HEAD>
<BODY>
<H1>Color Text</H1>
<HR>
<FORM ACTION="/cgi-bin/gd_text.pl" METHOD="POST">
This form makes it possible to display color text and messages.<BR>
What message would you like to display: <BR>
<INPUT TYPE="text" NAME="message" SIZE=60><BR>
What is your favorite color:
<SELECT NAME="color" SIZE=1>
<OPTION SELECTED>Red
<OPTION>Blue
<OPTION>Green
<OPTION>Yellow
<OPTION>Orange
<OPTION>Purple
<OPTION>Brown
<OPTION>Black
</SELECT>
<P>
<INPUT TYPE="submit" VALUE="Submit the form">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>
<HR>
</BODY>
    </HTML>

This displays a form with one text field and a menu, along with the customary Submit and Reset buttons. The form and the program allow you to display color text in the browser's window. For example, if you want a red headline in your document, you can fill out the form or access the program directly:

<IMG SRC="/cgi-bin/gd_text.pl?message=Welcome+to+this+Web+server&color=Red>

This will place the GIF image with the message “Welcome to this Web server” in red into your HTML document. Now, here's the program:

#!/usr/local/bin/perl5
use GD;
$| = 1;
$webmaster = "shishir\@bu\.edu";
print "Content-type: image/gif", "\n\n";
&parse_form_data (*color_text);
$message = $color_text{'message'};
$color = $color_text{'color'};
if (!$message) {
    $message = "This is an example of " . $color . " text";
}

The form data is parsed and placed in the color_text associative array. The selected text and color are stored in $message, and $color, respectively. If the user did not enter any text, a default message is chosen.

This program uses the gd graphics library, which we discuss more fully in Chapter 6, Hypermedia Documents.

$font_length = 8;
$font_height = 16;
$length = length ($message);
$x = $length * $font_length;
$y = $font_height;
$image = new GD::Image ($x, $y);

The length of the user-specified string is determined. A new image is created based on this length.

$white = $image->colorAllocate (255, 255, 255);
if ($color eq "Red") {
    @color_index = (255, 0, 0);
} elsif ($color eq "Blue") {
    @color_index = (0, 0, 255);
} elsif ($color eq "Green") {
    @color_index = (0, 255, 0);
} elsif ($color eq "Yellow") {
    @color_index = (255, 255, 0);
} elsif ($color eq "Orange") {
    @color_index = (255, 165, 0);
} elsif ($color eq "Purple") {
    @color_index = (160, 32, 240);
} elsif ($color eq "Brown") {
    @color_index = (165, 42, 42);
} elsif ($color eq "Black") {
    @color_index = (0, 0, 0);
}
$selected_color = $image->colorAllocate (@color_index);
$image->transparent ($white);

Red, Green, and Blue (RGB) values for the user-selected color are stored in the color_index array. If no color is selected manually, the default is Red, as specified in the form. If you want to add more colors, look in /usr/local/X11/lib/rgb.txt for a list of the common colors. The transparent function makes the image background transparent.

$image->string (gdLargeFont, 0, 0, $message, $selected_color);
print $image->gif;
exit(0);

The text is displayed using the string operator, and the image is printed to standard output. As discussed in the previous example, you can also access this program with a GET request.

4.4 Decoding Forms in Other Languages

Since Perl contains powerful pattern-matching operators and string manipulation functions, it is very simple to decode form information. Unfortunately, this process is not as easy when dealing with other high-level languages, as most of them lack these kinds of operators. However, there are various libraries of functions on the Internet that make the decoding process easier, as well as the uncgi program (http://www.hyperion.com/~koreth/uncgi.html).

C Shell (csh)

It is difficult to decode form information using native C shell commands. csh was not designed to perform this type of string manipulation. As a result, you have to use external programs to achieve the task. The easiest and most versatile package available for handling form queries is uncgi, which decodes the form information and stores them in environment variables that can be accessed not only by csh, but also by any other language, such as Perl, Tcl, and C/C++. For example, if the form contains two text fields, named “user” and “age,” uncgi will place the form data in the variables WWW_user and WWW_age, respectively. Here is a simple form and a csh CGI script to handle the information:

<HTML>
<HEAD><TITLE>Simple C Shell and uncgi Example</TITLE></HEAD>
<BODY>
<H1>Simple C Shell and uncgi Example</H1>
<HR>
<FORM ACTION="/cgi-bin/uncgi/simple.csh" METHOD="POST">
Enter name: <INPUT TYPE="text" NAME="name" SIZE=40><BR>
Age: <INPUT TYPE="text" NAME="age" SIZE=3 MAXLENGTH=3><BR>
What do you like:<BR>
<SELECT NAME="drink" MULTIPLE>
<OPTION>Coffee
<OPTION>Tea
<OPTION>Soft Drink
<OPTION>Alcohol
<OPTION>Milk
<OPTION>Water
</SELECT>
<P>
<INPUT TYPE="submit" VALUE="Submit the form">
<INPUT TYPE="reset"  VALUE="Clear all fields">
</FORM>
<HR>
</BODY>
</HTML>

Notice the URL associated with the ACTION attribute! It points to the uncgi executable, with extra path information (your program name). The server executes uncgi, which then invokes your program based on the path information. Remember, your program does not necessarily have to be a csh script; it can be a program written in any language. Now, let's look at the program.

#!/usr/local/bin/csh
echo "Content-type: text/plain"
echo ""

The usual header information is printed out.

if ($?WWW_name) then
    echo "Hi $WWW_name -- Nice to meet you."
else
    echo "Don't want to tell me your name, huh?"
    echo "I know you are calling in from $REMOTE_HOST."
    echo ""
endif

uncgi takes the information in the “name” text entry field and places it in the environment variable WWW_name.

In csh, environment variables are accessed by prefixing a “$” to the name (e.g., $REMOTE_HOST). When checking for the existence of variables, however, you must use the C shell's $? construct. I use $? in the conditional to check for the existence of WWW_Name. You cannot check for the existence of data directly:

if ($WWW_name) then
    ....
else
    ....
endif

If the user did not enter any data into the “name” text entry field, uncgi will not set a corresponding environment variable. If you then try to check for data using the method shown above, the C shell will give you an error indicating the variable does not exist.

The same procedure is applied to the “age” text entry field.

if ($?WWW_age) then
    echo "You are $WWW_age years old."
else
    echo "Are you shy about your age?"
endif
echo ""
if ($?WWW_drink) then
    echo "You like: $WWW_drink" | tr '#' ''
else
    echo "I guess you don't like any fluids."
endif
exit(0)

Here is another important point to remember. Since the form contains a scrolled list with the multiple selection property, uncgi will place all the selected values in the variable, separated by the “ #” symbol. The UNIX command tr converts the “#” character to the space character within the variable for viewing purposes.

C/C++

There are a few form decoding function libraries for C and C++. These include the previously mentioned uncgi library, and Enterprise Integration Technologies Corporation's (EIT) libcgi. Both of them are simple to use.

C/C++ decoding using uncgi

Let's look at an example using uncgi (assuming the HTML form is the same as the one used in the previous example):

#include <stdio.h>
#include <stdlib.h>

These two libraries--standard I/O and standard library--are used in the following program. The getenv function, used to access environment variables, is declared in stdlib.h.

void main (void)
{
    char *name,
         *age,
         *drink,
         *remote_host;
    printf ("Content-type: text/plain\n\n");

    uncgi();

Four variables are declared to store environment variable data. The uncgi function retrieves the form information and stores it in environment variables. For example, a form variable called name, would be stored in the environment variable WWW_name.

name = getenv ("WWW_name");
    age = getenv ("WWW_age");
    drink = getenv ("WWW_drink");
    remote_host = getenv ("REMOTE_HOST");

The getenv standard library function reads the environment variables, and returns a string containing the appropriate information.

if (name == NULL) {
        printf ("Don't want to tell me your name, huh?\n");
        printf ("I know you are calling in from %s.\n\n", remote_host);
    } else {
        printf ("Hi %s -- Nice to meet you.\n", name);
    }

    if (age == NULL) {
        printf ("Are you shy about your age?\n");
    } else {
        printf ("You are %s years old.\n", age);
    }

    printf ("\n");

Depending on the user information in the form, various informational messages are output.

if (drink == NULL) {
        printf ("I guess you don't like any fluids.\n");
    } else {
        printf ("You like: ");

        while (*drink != '\0') {
            if (*drink == '#') {
                printf (" ");
            } else {
                printf ("%c", *drink);
            }
            ++drink;
        }

        printf ("\n");
    }

    exit(0);
}

The program checks each character in order to convert the “#” symbols to spaces. If the character is a “#” symbol, a space is output. Otherwise, the character itself is displayed. This process takes up eight lines of code, and is difficult to implement when compared to Perl. In Perl, it can be done simply like this:

$drink =~ s/#/ /g;

This example points out one of the major deficiencies of C for CGI program design: pattern matching.

C/C++ decoding using libcgi

Now, let's look at another example in C. But this time, we will use EIT's libcgi library, which you can get from http://wsk.eit.com/wsk/dist/doc/libcgi/libcgi.html.

#include <stdio.h>
#include "cgi.h"

The header file cgi.h contains the prototypes for the functions in the library. Simply put, the file--like all the other header files--contains a list of all the functions and their arguments.

cgi_main (cgi_info *cgi)
{
    char *name,
         *age,
         *drink,
         *remote_host;

Notice that there is no main function in this program. The libcgi library actually contains the main function, which fills a struct called cgi_info with environment variables and data retrieved from the form. It passes this struct to your cgi_main function. In the function I've written here, the variable cgi refers to that struct:

form_entry *form_data;

The variable type form_entry is a linked list that is meant to hold key/value pairs, and is defined in the library. In this program, form_data is declared to be of type form_entry.

print_mimeheader ("text/plain");

The print_mimeheader function is used to output a specific MIME header. Technically, this function is not any different from doing the following:

print "Content-type: text/plain\n\n";

However, the function does simplify things a bit, in that the programmer does not have to worry about accidentally forgetting to output the two newline characters after the MIME header.

form_data = get_form_entries (cgi);
    name = parmval (form_data, "name");
    age = parmval (form_data, "ge");
    drink = parmval (form_data, "drink");

The get_form_entries function parses the cgi struct for form information, and places it in the variable form_data. The function takes care of decoding the hexadecimal characters in the input. The parmval function retrieves the value corresponding to each form variable (key).

if (name == NULL) {
        printf ("Don't want to tell me your name, huh?\n");
        printf ("I know you are calling in from %s.\n\n", cgi->remote_host);
    } else {
        printf ("Hi %s -- Nice to meet you.\n", name);
    }

Notice how the REMOTE_HOST environment variable is accessed. The libcgi library places all the environment variable information into the cgi struct.

Of course, you can still use the getenv function to retrieve environment information.

if (age == NULL) {
        printf ("Are you shy about your age?\n");
    } else {
        printf ("You are %s years old.\n", age);
    }

    printf ("\n");

    if (drink == NULL) {
        printf ("I guess you don't like any fluids.\n");
    } else {
        printf ("You like: %s", drink);
        printf ("\n");
    }

    free_form_entries (form_data);
    exit(0);
}

Unfortunately, this library does not handle multiple keys properly. For example, if the form has multiple checkboxes with the same variable name, libcgi will return just one value for a specific key.

Once the form processing is complete, you should call the free_form_entries function to remove the linked list from memory.

In addition to the functions discussed, libcgi offers numerous other ones to aid in form processing. One of the functions that you might find useful is the mcode function. Here is an example illustrating this function:

switch (mcode (cgi)) {
    case MCODE_GET:
        printf("Request Method: GET\n");
        break;
    case MCODE_POST:
        printf("Request Method: POST\n");
        break;
    default:
        printf("Unrecognized method: %s\n", cgi->request_method);
}

The mcode function reads the REQUEST_METHOD information from the cgi struct and returns a code identifying the type of request.

Tcl

Unlike C/C++, Tcl does contain semi-efficient pattern matching functions. These functions can be used to decode form information. However, according to benchmark test results posted in comp.lang.perl, the regular expression functions as implemented in Tcl are quite inefficient, especially when compared to Perl. But you are not limited to writing form decoding routines in Tcl, beca

[1] Before the forms interface, the only way you could retrieve user information was through a search field (i.e., <ISINDEX>), which passed the data to the server with spaces converted to plus signs ( “+”).

[2] The information in the password field is not encrypted in any way; it is plain text. You have to be very careful when asking for sensitive data using the password field. If you want security, please use server authentication.

Get CGI Programming on the World Wide Web now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.