Keeping Programs Trim with CGI_Lite
This month, we will look at CGI_Lite, a Perl 5 module written by Shishir Gundavaram. CGI_Lite is one of several modules available for CGI programmers; the best known of the bunch is CGI.pm, written by Lincoln Stein. Indeed, I have used CGI.pm in nearly every “At the Forge” column, as well as in many programs over the last few years, on many web sites.
While CGI.pm is useful and rich in features, it is also large, weighing in at a hefty 153KB. On my Red Hat 4.2 system with 40MB of RAM, starting Perl 5 and loading CGI.pm uses about 2.7 percent of the physical memory—over 1MB—before I have even allocated any data structures. On a popular Web server, it is easy to imagine how many CGI programs running simultaneously would lead to a heavy load, both on the CPU and on the server's memory, leading to a significant slow-down.
There are a number of solutions to this problem, including using a language other than Perl for CGI programs. This month, though, we will look at another solution: CGI_Lite.pm, a module that does less than CGI.pm but is much smaller and faster. CGI_Lite.pm takes a mere 17KB on disk, and when loaded into memory along with Perl 5, takes only 2.0 percent of the physical memory on my system, about 800KB. This is still a relatively large amount of memory, but given that invoking Perl 5 uses about 560KB, it strikes me as a reasonable trade-off.
CGI_Lite.pm is not a panacea; it leaves out a number of useful features that have made their way into CGI.pm over the years. However, if your CGI programs require only a limited set of features and you would like to keep your programs as trim as possible, you might want to consider using CGI_Lite in at least some of your programs.
Before you can use CGI_Lite, you need to get a copy from CPAN (the Comprehensive Perl Archive Network), a set of FTP and web servers that make Perl code, documentation and utilities available to the public for free. As of this writing, the latest version of CGI_Lite is 1.8, meaning that you can retrieve it from the URL https://www.perl.com/CPAN/modules/by-module/CGI/CGI_Lite-1.8.tar.gz.
If CGI_Lite has been updated by the time you read this, you might need to change the numbers in the last part of the URL. Once you have retrieved the module, you can unpack it with the command:
tar -zxvf CGI_Lite-1.8.tar.gz
which uncompresses (-z) verbosely (-v) the file (-f). This action creates CGI_Lite-1.8 on my system. Then perform the standard Perl module installation as follows:
perl Makefile.PL make make installNote that you may have to be logged in as root in order to install CGI_Lite on your system.
Once the module is installed, you can use it in any program by including the line:
use CGI_Lite;
at the top of your program.
Of course, including a module is the easy part—learning how to use it can be a bit more complicated. Let's see how to use CGI_Lite.pm by creating a simple program, one which expects to receive a user's first name from an HTML form. When the form is submitted, the program prints a short personalized greeting to the user. If you are wondering why we are starting with an HTML form and the POST method, rather than the simpler GET method, stay tuned—it is harder than you might think.
Listing 1. HTML Form with Single Text Field
Listing 1 is a simple HTML form, containing a single text field called firstname, that we can use for our test. When a user clicks on the submit button in this form, the firstname text field is sent via the POST method to the program called /cgi-bin/hello.pl. Listing 2 shows one way in which we might write hello.pl using CGI_Lite.pm.
Listing 2. Initial Version of Perl Program hello.pl
In short, this program executes the following actions:
Imports the CGI_Lite module.
Creates an instance of CGI_Lite.
Retrieves the HTML form elements into a hash, also known as an “associative array”.
Uses the value of the firstname form element to return a string to the user.
Now that we have gotten an overall picture of what is happening in the above program, let's look at this in greater detail, with a bit of attention to some of the differences between CGI.pm and CGI_Lite.pm.
In CGI.pm, we can retrieve form elements using the param method. When invoked in a scalar context, param allows us to retrieve the value of a single HTML form element. For example, if we have defined $query to be an instance of CGI, we can place the value of the firstname field in the $firstname variable with the following statement:
my $firstname = $query->param("firstname");
If we invoke param in an array context, then we get a list of all form elements that were submitted to the program. For example, if we want to put the names of all HTML form elements into the array @names, we can do so with the following statement:
my @names = $query->param;We can then iterate through @names to retrieve and print the value associated with each form element, as in:
my $element = ""; foreach $element (@names) { print "<P>$element = ", $query->param($element), "</P>\n"; }We can accomplish this with CGI_Lite.pm, but in a slightly different way. CGI_Lite.pm has a single method for retrieving form elements, one which uses hashes rather than a mixture of scalars and arrays. To retrieve form elements, we use the method parse_form_data, which returns its results in a hash. Retrieving individual form elements is thus a two-step process. First we put all of the elements into the hash, and then we retrieve the one in which we are interested:
my %FORM = parse_form_data; my $firstname = $FORM{"firstname"};If we want to get a list of the form elements that were sent, we can use the keys function. Thus, to put the names of the form elements in the array @names, we can type:
my @names = keys %FORM;We can even get them in alphabetical order, by prefacing keys with a call to sort:
my @names = sort keys %FORM;We can print the names and values of all form elements by iterating through @names and retrieving the values in which we are interested:
my $element = ""; foreach $element (@names) { print ",<P>$element = ", $FORM{$element}, "</P>\n"; }
If we know that we want to put one or more of the form elements into scalar variables (and not keep them in the hash), we can do so by calling the method create_variables. For instance, in our example above, we first had to use parse_form_data in order to get the form elements into the hash %FORM. Then we had to assign $firstname in a separate step. If we had wanted to assign 10 variables based on the contents of the form, we would have needed to make 10 separate assignments, which is rather inefficient.
To get around this problem, we can use the create_variables method, which automatically creates local variables for us. If we want to turn each form element into its own variable, we can simply invoke:
$query->create_variables(\%FORM);
When this method returns, we have a new variable defined for each element that was in the submitted form. Thus, if we have a form element named firstname, the value associated with that element is now available via the variable $firstname. The backslash in front of %FORM gives us a reference to the hash, a new feature in Perl 5 documented in great detail in the Perl manual pages (available by typing man perlref on most Linux systems).
There is one potential problem with create_variables, namely, your program might define variables with the same names as one or more form elements. For example, Listing 3 is a version of hello.pl in which we give the variable $firstname a value and call create_variables on the submitted form that included an element named firstname.
Listing 3. Second Version of Perl Program hello.pl
When $firstname is set to the value NOT CHANGED, as in Listing 3, the value of the HTML form element firstname is ignored when we call create_variables, and we get a greeting to NOT CHANGED, rather than the user's first name. If we comment out the line defining $firstname as NOT CHANGED, create_variables does its job just fine, creating a variable named $firstname and giving it the value that the user provided. This behavior is a good idea in terms of web security, but the silent failure of one or more variable assignments strikes me as a possible pitfall.
CGI.pm offers similar functionality with its import_names method. In this case, the authors encourage users to import names into a separate name space, ensuring that there are no name conflicts with existing variables.
Notice that in the Listing 3 version of hello.pl, I have removed the use strict line. This was to avoid possible conflicts when commenting out the line that defines a default value for $firstname. The strict module requires that you define variables before using them; however, if we are referencing variables that are created by create_variables, this is impossible.
CGI_Lite.pm is smart enough to grab form elements passed by either of the two methods: GET or POST. POST is generally considered to be the better method of the two, since it passes the contents of the form to a CGI program via standard input (stdin), rather than as part of the URL. However, if we were interested in passing a name to hello.pl as part of the URL, we could do so as follows:
https://localhost/cgi-bin/hello.pl?firstname=Reuven
Of course, if you are testing this program from a computer other than the web server, you need to replace localhost with the name of a server. For example, if your server runs on www.fictional.edu, you could use:
https://www.fictional.edu/cgi-bin/hello.pl?firstname=ReuvenNotice how we can set the variable's value after the question mark, known in CGI lingo as the “query string”. The query string is part of the URL, and URLs may not contain white space or other “dangerous” characters that might be misinterpreted by the browser and/or the server. For these reasons, certain characters must be sent in “percent-hex” format, in which the character's ASCII value in hexadecimal is prefaced by a percent sign. Obviously, the percent sign itself (ASCII value 0x25) must be encoded in this way. Thus, if my “first” name were actually two names, I could send the string as follows:
https://www.fictional.edu/cgi-bin/hello.pl?firstname=J%20EdgarSince the “space” character is ASCII 0x20 (32 in decimal), we can insert a space into the URL by sending a %20. CGI_Lite.pm automatically translates the percent-hex encoding into the ASCII codes we want.
While GET can be used to send name,value pairs, it is often used to send simple text strings. For example, it might be nice to send a name without assigning any value, as in:
https://www.fictional.edu/cgi-bin/hello.pl?J%20Edgar
This technique is often useful when CGI programs have to receive a user's unique ID in a relational database running on the web server. If we send the identifier as part of the query string, the program can grab that value and use it as the index into a table in the database, producing a personalized home page or otherwise unique output customized for the user.
Several on-line booksellers use this method. When I go to Amazon.com to check the status of my latest order, I go to a URL that looks like:
https://www.mybookstore.com/cgi-bin/order.pl?1234-5678-9012
What I would like is a simple way of retrieving this string. CGI.pm allows you to get the string by pretending that the contents of the query string are assigned to the variable named keywords, so if we are using CGI.pm, we can type:
my $id_number = $query->param("keywords");Now, the variable $id_number contains the value “1234-5678-9012”.
If we are using CGI_Lite.pm, things get a bit more complicated, because the module expects the query string to only be used for sending name,value pairs, not individual text strings. Thus, when we send the above query string, CGI_Lite.pm assumes that what we are actually setting the form element named “1234-5678-9012” to a null value—not quite what we might expect, but something which we can manage.
One possible method is to load parse_form_data to turn the received name,value pairs into a hash. The hash contains a single key, corresponding to the data that was passed in the query string, which CGI_Lite.pm thinks is a variable name. We can then retrieve that key by getting the list of keys in our hash. Listing 4 is code that accomplishes that feat.
This is not the most efficient way to get the information, but it does do the trick. We could simply read the information from the QUERY_STRING environment variable—but that would introduce another problem, namely, the translation of characters sent in percent-hex encoding. By using the built-in facilities of CGI_Lite.pm, we ensure that the translation is done correctly.
If you find this somewhat confusing, you're not alone. Many of my own programs take advantage of the query string, and having to pretend that my data is really a variable name strikes me as a bit odd. Perhaps a future version of CGI_Lite.pm will handle this, although adding too many features would eventually turn it into CGI_NoLongerLite.pm.
Debugging CGI programs is often difficult because the execution takes place behind the scenes. In contrast with more typical programs, which allow us to interact with them while they are running, CGI programs are invoked by Web servers, with input coming from the user's Web browser (via the Web server, which hands that data to the program), and with output returning to the user's browser (again, via the Web server).
CGI.pm offers two good aids to debugging CGI programs. A dump method that prints out the contents of all HTML form variables as they are received, and a command-line interface that allows programmers to enter variable assignments without invoking the program from an HTML form.
In keeping with its light-weight philosophy, CGI_Lite.pm does not offer the command-line debugging interface, which might make debugging large programs difficult. However, it does offer a way to check the data that was received from the user's web browser. The print_form_data method sends all of the known name-value pairs to stdout. If your program does not work correctly and you want to check the values of the input data, you can add the following line to your program:
$query->print_form_data;
With the above discussion in mind, which module should you use when writing your CGI programs? In most cases, I would tend to stick with CGI.pm, for a variety of reasons.
First of all, I tend to use CGI.pm's command-line debugging interface quite a bit, and the fact that CGI_Lite.pm lacks such an ability is a major hindrance for me. It is certainly possible to get around this problem, since I wrote CGI programs for a while before CGI.pm appeared on the scene, but it never hurts to have another debugging tool in your arsenal, particularly when writing large, complicated programs.
A second reason why I would tend to favor CGI.pm is because I often have to work with other people on projects, and using two different interfaces to the CGI standard might make life difficult for them. (We have enough problems as is; we don't also need to try to remember whether we should be using param or parse_form_data in order to retrieve information.)
Third, I find it useful to have extra functions that take care of the small parts of producing HTML. I used to constantly forget to put two newline (\n) characters at the end of MIME headers; with CGI.pm, I no longer have to remember.
At the same time, I find it somewhat irresponsible to write small CGI programs that use over 1MB of RAM before they even begin to perform any calculations or allocate any data structures. For small projects in which I want to use Perl (rather than a compiled language, such as C) but in which I want to maximize efficiency, I use CGI_Lite.pm. I also like the use of hashes, which strikes me as a natural way to store and retrieve form elements. Also, the fact that CGI_Lite.pm does almost everything I wish, including such advanced items as HTTP cookies and the uploading of files, makes it rather attractive for small-scale projects.
In an era of software bloat and programs that try to do more and more, it is refreshing to find a module that tries to do less and does it well. CGI_Lite.pm is not appropriate in all cases, but it is useful, well documented and efficient. If you are trying to squeeze the last few ounces of memory and CPU time from your web server, consider using CGI_Lite.pm in your next program—and enjoy the extra RAM for other projects.
Reuven M. Lerner is an Internet and Web consultant living in Haifa, Israel, who has been using the Web since early 1993. In his spare time, he cooks, reads and volunteers with educational projects in his community. You can reach him at reuven@netvision.net.il.