Combining Apache and Perl
The CGI (Common Gateway Interface) standard has been around for several years and is beginning to show its age. CGI is great because all web servers support it, programmers can write in any language, and programs can be portable across a large number of platforms. Netscape's NSAPI and Microsoft's ISAPI bind more tightly to their respective web servers, but programmers interested in using these APIs are much more restricted than with CGI.
A particularly big problem with CGI is its inefficiency. Each invocation of a CGI program creates a new process on the server. If you write CGI programs in Perl, you are starting a new copy of Perl each time a CGI program runs, using additional memory and processor time. Wouldn't it be nice if we could have the flexibility of CGI programs without having to use all of those system resources? Better yet, wouldn't it be great if we could use our existing CGI programs in such a framework with little or no modification? The answer, of course, is “yes”; even as hardware continues to get cheaper and more powerful, it seems silly to be wasting memory and CPU time unnecessarily.
This month, we look at mod_perl--one of the proposed solutions to this problem. mod_perl is a module for the popular and powerful Apache web server, which runs on many operating systems including Linux. At the most basic level, mod_perl makes it possible to run server-side Perl programs more efficiently than when using the CGI protocol. However, mod_perl offers much more than efficiency, as we will see. It also provides a full interface to the Apache internals, giving Perl programmers a chance to modify the web server itself.
Apache modules are configured and installed at compile time. If you are interested in installing mod_perl, you have to download and recompile the source code in Apache. Luckily, this is rather easy to do. Note that while anyone can download, configure and compile Apache, only someone with root access can install Apache to its default position. If you don't have root access, you will still be able to run, but only on an unrestricted port number, namely, one above 1024.
The latest version of mod_perl is always available from CPAN (Comprehensive Perl Archive Network). At this time, the latest version of mod_perl is 1.10, which means that you can retrieve it from https://www.perl.com/CPAN/modules/by-module/Apache/mod_perl-1.10.tar.gz. Later versions will have the same URL, with a different version number. In addition, try to use a CPAN mirror close to you, rather than loading down www.perl.com; go to https://www.perl.com/CPAN/ for help in finding one.
Once you have downloaded mod_perl, you will also have to download the latest version of Apache, 1.2.6, from https://www.apache.org/ or one of its mirrors. Unpack the Apache and mod_perl distributions in the same directory. On my system, I did the following:
cd /downloads tar -zxvf apache_1.2.6.tar.gz tar -zxvf mod_perl-1.10.tar.gz
If you want to modify the default Apache module set, now is the time to modify /src/Configuration. If you are not familiar with Apache configuration, don't worry—things will work just fine without customizing the module set.
The rest of the Apache configuration and compilation is done within the mod_perl directory. Move into the mod_perl directory (probably called something like mod_perl-1.10) and type:
perl Makefile.PL
On my system, mod_perl asks me two questions:
Configure mod_perl with ../apache_1.2.6/src ? [y]to which I press return, and
Shall I build httpd in ../apache_1.2.6/src for you? [y]to which I press return again. This configures all of the files necessary for building mod_perl and Apache. When the UNIX shell prompt returns, simply type make and press return. The resulting Apache binary (httpd) will be in the src subdirectory under the Apache directory. On my system, httpd resides in /usr/sbin/httpd, so copying the resulting binary will replace the old Apache with the new one.
Restart Apache by logging in as root and typing:
killall -1 -v httpd
Now, you're in business with your new version of Apache. If you're not sure whether the new version has been installed, connect to the web server and ask for its version information:
telnet localhost 80After connecting, type:
HEAD / HTTP/1.0On my system, I get the following response:
HTTP/1.1 200 OK Date: Sun, 12 Apr 1998 19:02:41 GMT Server: Apache/1.2.6 mod_perl/1.10 Connection: close Content-Type: text/htmlIn other words, the web server running on port 80 (the default port for HTTP traffic) is running Apache 1.2.6, with mod_perl 1.10 compiled in.
One of the most popular uses for mod_perl is as a fast replacement for CGI. In order to use it this way, we need to modify Apache's configuration files, so it knows how to handle programs that use mod_perl.
Why must Apache know how to treat these programs? Thinking about CGI programs should make it clear. Browsers request CGI programs in exactly the way they request static documents. The browser does not know whether a given URL points to a program or a static document; that determination is made by the server. If the request is for a static document, the server returns the document verbatim to the user's browser. If the request is for a program, the server executes it and returns any output to the user's browser.
In both of these cases, the browser's behavior is the same: it sends the request to the server and displays the contents of any received response. This places the onus on the server to recognize which files are to be transmitted verbatim, and which are programs whose output will be sent as a response. Apache lets us choose between allowing CGI programs to be located anywhere on the system (as long as they end with an agreed-upon suffix, such as .pl or .cgi) and requiring that they be located in one or more designated directories. This is done using directives in the Apache configuration files.
Now that we have added mod_perl to our server, we must tell Apache how to handle three types of URLs: static documents, CGI programs and mod_perl programs. Adding mod_perl to the mix does not have to change the existing configuration on your system. I created a directory named perl-bin under my web root directory (/home/httpd/perl-bin) and decided all mod_perl programs would reside there, just as all CGI programs reside in cgi-bin. I then added the following lines to my server's srm.conf file:
<Location /perl-bin> SetHandler perl-script PerlHandler Apache::Registry Options ExecCGI </Location>
The <Location> and </Location> tags indicate that we want our settings to take effect for a particular directory, rather than the entire Apache server. Then, we tell Apache to treat documents in the perl-bin directory as Perl scripts, rather than static documents or something else. If you are curious, the Apache manual has an entire section describing handlers, including the AddHandler and SetHandler directives that allow us to configure file types according to location or file extension. Other handlers, for instance, include cgi-script (for CGI programs), server-info (for information about the server) and imap-file (for image maps).
Now that Apache knows which files in /perl-bin should be considered mod_perl programs, we must tell mod_perl how to handle these Perl documents. We will use the Apache::Registry module, which allows us to run CGI programs. Finally, we will use the Options directive to allow CGI programs to be run within this directory.
Finally, we make one last modification to srm.conf, telling mod_perl to produce HTTP headers. We do that outside of the <Location> directive, since we always want mod_perl to return complete headers. The line to add is:
PerlSendHeader On
Adding the PerlSendHeader directive does not relieve us from the responsibility of indicating the type of content we are returning. In other words, we still must add the “Content-type” header to the top of our output, just as we do when writing CGI programs.
All the pieces are now in place to use mod_perl instead of CGI programs. Let's try a simple program that prints out the current state of the environment. Copy the following into a file called test.pl in the perl-bin directory:
use strict; print "Content-type: text/html\n\n"; foreach my $key (sort keys %ENV) { print "\"$key\" = \"$ENV{$key}\"<BR>\n"; }
Set permissions so that the file is executable, and ask your browser to retrieve /perl-bin/test.pl. If all goes well, you will see a list of environment variables in your browser.
If you have been writing CGI programs (or using Perl for any length of time), then the above might seem strange. For example, where is the initial line indicating the location of the Perl interpreter, as well as its switches? The initial hash-bang (#!) syntax which we are so accustomed to is missing because it's unnecessary. That two-character code tells the UNIX shell that it shouldn't try to interpret a program (i.e., as a shell script), but rather that it should give the responsibility to another program. That's why Perl programs usually begin with the line:
#!/usr/bin/perl
while Tcl programs begin with:
#!/usr/bin/tclshand so forth. Because our program is run by mod_perl and mod_perl understands Perl programs, we don't need the hash-bang syntax at the top of our program.
Command-line switches raise a more subtle issue, one that cuts to the heart of mod_perl's advantages over standard CGI programs. Programs run much faster under mod_perl for several reasons, but the two primary ones are that Perl is embedded in Apache (saving the overhead of starting Perl with each invocation), and programs are compiled once, then cached (saving the overhead of compilation with each invocation). The combination of embedding Perl within Apache and caching compiled programs can mean a tremendous boost in execution speed, often ranging from 400 percent to 2000 percent.
There are tradeoffs for these increases in speed, and one of them is that command-line switches no longer work as expected. Switches are handled at compilation time, so if you expect switches to work each time your program is run, you will be disappointed. However, all is not lost. Programmers interested in turning on Perl's warnings (the -w flag) and security checks (the -T flag, for tainting) from within mod_perl programs can do so with a directive inside of the srm.conf file. To turn on warnings, you simply add the line:
PerlWarn On
This has the effect of turning on warnings from within your programs. As usual, warning messages are sent to the Apache error log.
By the same token, you can activate Perl's security checks (commonly known as “tainting”) by adding the PerlTaintCheck directive inside of srm.conf:
PerlTaintCheck On
When you write CGI programs (or any other programs, for that matter) in Perl, it is usually a good idea to include the use strict directive, as we saw in the above example. When programming with mod_perl, however, it is extremely important to use strict. Otherwise, variable definitions may remain in memory after your program exits, creating problems for future invocations of this or other programs.
By the same token, do not use the exit function to leave your program prematurely. Normally, calling exit from within a CGI program will end the program—not a bad thing, if it has already produced all of its output. If you call exit from within a mod_perl program, the program takes Perl along with it; and since Perl is embedded within the copy of Apache, killing Perl effectively kills that particular server process as well. If you absolutely must call exit from within your program, use Apache::exit instead; it will do what you want without unexpected side effects.
Now that we have gone through a basic introduction to mod_perl and writing CGI-style programs with Apache::Registry, let's look at an example of CGI programs under mod_perl—a simple guest book program that takes form parameters and appends their contents to a file on the system. The form is shown in Listing 1. Note that the form looks just like the forms we have seen in the past. The sole difference is our form's action, which sits in perl-bin rather than the usual cgi-bin.
The program is shown in Listing 2. If we were to add a “hash-bang” first line to this program, it would operate equally well under CGI or mod_perl environments. We use CGI.pm to retrieve information about the submitted form. While this works just fine for recent versions of CGI.pm, earlier versions are not completely compatible with Apache::Registry.
The main difference between the program in Listing 2 and its CGI counterpart is speed. While I cannot give exact numbers, my subjective tests showed the response from mod_perl to be almost instantaneous, with the CGI version taking noticeably longer—perhaps up to one second. This might not seem like much, but the combination of a cached CGI program with an already-running version of Perl is impressive, even with a short program that compiles quickly. As you can see, it does not require many changes to your original program.
So far, I have mentioned mod_perl only as a replacement for CGI. However, mod_perl is much more than that; it gives you a Perl interface to the guts of Apache. If you have configured your server correctly, you can modify every aspect of Apache using a Perl program. Better yet, some enterprising souls have already spent time writing modules which do just that. For example, the Apache::Status module allows you to take a look at the current state of mod_perl running on your server. Apache::Status comes with mod_perl and is a good example of what can be done with this package.
As was the case with Apache::Registry, we are going to have to set the handlers for a particular directory. In this case, the directory does not have to physically exist on the disk, since the URL is interpreted before a file is ever opened. You must add these lines to your srm.conf file in order to get the Perl status:
<Location /perl-status> SetHandler perl-script PerlHandler Apache::Status </Location>
As was the case with Apache::Registry, we set the Apache handler to be perl-script. Since we want Apache::Status to be handling the perl-status directory, we point to it as our PerlHandler.
If you put the above lines in your server's srm.conf file and restart the server, anyone requesting /perl-status from your server is going to have access to information about your server. If you would prefer to keep such information private, you must use access controls, as shown in the following example:
<Location /perl-status> SetHandler perl-script PerlHandler Apache::Status order deny,allow deny from all allow from 127. </Location>
This allows you to retrieve status information from the server computer itself; attempts to retrieve /perl-status from another computer will be greeted with an “unauthorized access” message.
I have been surprised and impressed by mod_perl's speed and flexibility, and I expect to use it more and more as time goes on. The fact that it can run most existing CGI programs without modification is a great boon to those of us who already have a stockpile of such programs.
mod_perl is not a panacea, of course. Its speed comes at a price; namely, greater demands for system memory. The inclusion of Perl (a known memory hog) inside of Apache means that the httpd processes on your server will start off larger than otherwise. Over time, each server process will grow, as compiled Perl programs are cached in memory. Before you use mod_perl on your system, you should make some calculations regarding the amount of memory that Apache is using; this may affect the number of server processes you want to run on your system.
Nevertheless, mod_perl is a tremendous advance for both Apache and Perl and promises to get much better with time. Next month, we will look at some of the ways in which mod_perl can speed up our database connections, making Apache an even better server for dynamic sites dependent on relational databases.
Reuven M. Lerner is an Internet and Web consultant living in Haifa, Israel, who has been using the Web since early 1993. In his spare time, he cooks, reads and volunteers with educational projects in his community. You can reach him at reuven@netvision.net.il.