Dynamic Graphics and Personalization
Last month, we discussed the different ways a CGI program can create dynamically generated graphics output. That is, we wrote several programs in which the program describes its output not as “text/html”, but as “graphics/gif”.
This month, we will examine some more tools that allow us to create graphics dynamically. However, the graphics will have an additional twist this time, in that they will reflect an individual user's stock portfolio rather than a global set of data values.
As with any non-trivial software project, our first step must be to create a brief specification. In this particular project, we will have two major programs. In the first, the user will be able to create and edit a personal profile, describing the securities he or she owns. The second program will take the information in the user's profile and use it to create a personalized graphic stock portfolio.
This project brings together a number of tools we have discussed in previous installments of ATF. Nevertheless, it seems like a good idea for us to review them, since we are going to call on so many.
MySQL: MySQL is a small, inexpensive relational database available for Linux and many other operating systems. (See “Resources” for information on where to get it.) In addition to its low price, MySQL is quite fast and efficient, which makes it popular on many web sites. As a relational database, MySQL forces us to store information in one or more tables, in which each row refers to a separate record. As with most relational databases, we communicate with MySQL using SQL, a database query language. We cannot write programs in SQL; rather, we must embed our queries inside of a program written in a full programming language. In our case, that language is Perl.
DBI: While SQL might be a standard query language for communicating with databases, the software and libraries used to speak with those databases vary considerably. To talk to an Oracle server, you need Oracle libraries; to talk to a MySQL server, you need MySQL libraries, and so forth. As a result, the Perl database world was fractured for a long time, with special versions for individual databases.
Now, however, there is a better way: the DBI module standardized the API to relational databases, meaning that programmers moving from one database server to another have to learn only the different nuances of the SQL implementations. Previously, they had to learn a separate Perl API as well, which was frustrating. This was accomplished by separating the database code into two parts, one generic (DBI) and the other specific to each server (DBD). In order to use DBI, you will need to install the generic DBI libraries, and then one or more DBDs appropriate for the products you use.
There is a problem with DBI and the Web, however, which has to do with the way in which database servers were designed. In general, they expect a client program to open a connection, perform many queries, then disconnect. Opening a connection is thus quite slow and inefficient. When a CGI program is a database client, it must open a new database connection for each HTTP transaction. See the mod_perl section immediately below for one solution to this problem, Apache::DBI.
mod_perl for Apache: Web servers traditionally provided custom and dynamic output by invoking external programs, using the CGI standard. An HTTP server would pass information to the CGI program, which would then be expected to send its output to the user's browser. This output generally came in the form of HTML-formatted text, but as we saw last month, it is possible to produce graphics as well.
However, CGI is quite slow; every invocation of a CGI program requires a new process to be created. If you are using Perl, each invocation requires the program to be compiled into Perl's internal format, then executed.
An alternative method is to use mod_perl, a module for the free Apache HTTP server that embeds a fully working version of Perl inside the server. This has several ramifications, one of which being the fact that we can now create custom output without having to rely on external programs.
When using mod_perl, you can take advantage of a module known as Apache::DBI. This module pretends to work the same as DBI, but actually caches database handles ($dbh) across invocations. So even when your program thinks it is opening a new database connection, it is actually reusing a database handle from a previous invocation.
GIFgraph: The GIFgraph set of Perl modules allows us to create charts and graphs on the fly, from within CGI programs or mod_perl modules. We explored a basic use of GIFgraph last month. As its name implies, GIFgraph produces output in GIF format. Last month, we saw how to return GIFs directly to the user's browser. This month, we will instead save the resulting graphs to individual files, to which we will create hyperlinks.
Apache::Session: HTTP, the protocol on which the Web is based, was designed to be lightweight and simple. As part of this consideration, it was also designed to be “stateless”, meaning each transaction is independent. This creates a problem, however, in that you will often want to keep track of which user is which. For instance, in this application, we want to ensure we are tracking the correct user's portfolio. Without state, we cannot track portfolios at all, let alone for multiple users. Apache::Session, as we will see below, allows us to get around this by using a database and HTTP cookies to store one or more pieces of information using a unique identifier.
With the above five technologies, we can create a fairly impressive stock portfolio tracker that allows users to define which securities they own and view their current holdings in graphical format. As presented this month, the application is admittedly a bit crude, but it should show you how easy it is to write such an application and how flexible the above tools can be in creating one.
Before we can begin to work on the applications themselves, we need to create the underlying database tables they will use. We will need two different tables: one to hold the individual stock values on different dates and another to store user personalization information.
The first table, called StockValues, has three columns: a symbol, which can contain up to six characters; a value, which can range from 0 to 999999.999; and a date. We can create such a table with the following SQL, most often by using the interactive mysql client program that comes with MySQL:
CREATE TABLE StockValues ( symbol CHAR(6) NOT NULL, value NUMERIC(6,3) NOT NULL, date DATE NOT NULL );
Each row in the above table refers to the value of a single stock on a single day. By storing information like this, we can easily create charts for a stock during arbitrary periods of time. For the sake of brevity, our applications will always display all of the available values for a stock. The above table also gives us many possibilities for additional applications, such as finding a stock's high and low values during a given time period.
How will StockValues be populated with values? Most commercial sites retrieve stock information from a commercial service, using a background process to place the information in a database table. My budget is more limited than the average business web site, so I decided to insert some arbitrary values into StockValues. In order to do this, I used the interactive mysql client program, and entered several queries of this type:
INSERT INTO StockValues (symbol, value, date) VALUES ("ZZZZ", 100, "1999-07-14");
The second table we will create is for Apache::Session::DBI, a version of Apache::Session that allows us to store information about a particular user in a database table. The table's name and format are determined by the Apache::Session API:
CREATE TABLE sessions ( id char(16), length int(11), a_session text );Once we have created this table, we can ignore the fact that Apache::Session stores its information in a database. So far as we are concerned, we perform a magic incantation at the beginning of our code, which retrieves the current session values. We retrieve the user's session ID by reading an HTTP cookie:
my $id = $r->header_in('Cookie'); $id =~ s|SESSION_ID=(\w*)|$1|;Then, once we have assigned $id the value of the user's session ID, we tie the %session hash to the “sessions” table with the Apache::Session::DBI module:
my %session; tie %session, 'Apache::Session::DBI', $id, { DataSource => 'dbi:mysql:test:localhost:3306', UserName => '', Password => '' };From this point on, any name,value pairs stored in %session in previous sessions will be available. By the same token, we can assign
$session{key} = "value";and be sure that in our next invocation, despite HTTP's statelessness, we can retrieve the same value. Apache::Session thus makes it possible for us to store arbitrary quantities and types of information about a user.
We will store three session variables for each user. The e-mail address and name will be stored as scalars, and the user's current holdings will be stored as a hash reference. The keys to the %portfolio hash will be the stock symbols, and the number of shares owned in that particular security will be stored as the values.
When we want to store %portfolio as part of the session, we turn it into a reference and store that in %session with the key “portfolio”:
$session{portfolio} = \%portfolio;
A reference is a specially tagged scalar, which allows us to store it in a hash. We retrieve it later with the following complicated-looking code:
my %portfolio = defined $session{portfolio} ? %{$session{portfolio}} : ();The above uses Perl's trinary operator ?: as a shortcut to “if-then”. It means that if $session{portfolio} is defined, then dereference it into its original hash value and assign it to %portfolio. If it is undefined, then assign the empty hash, (), to %portfolio. After this line of code is executed, %portfolio will contain the user's current portfolio. By using Apache::Session, we can maintain the illusion of state across HTTP transactions, and store many users' portfolios in our database.
Now we will write the two applications that will work with this information. The code for those two applications can be found in the archive file at ftp://ftp.linuxjournal.com/pub/lj/listings/issue66/3629.tgz. The first will be StockProfile.pm, a Perl module for mod_perl that will allow users to create and edit their portfolios and personal information.
Since our program will be running as part of mod_perl, we will need to remember several things. First and foremost, we must create a new Perl module and package with a subroutine named “handler”. We will configure Apache to invoke this “handler” subroutine whenever a particular URL is requested from the HTTP server. Because our subroutine will be part of Apache rather than invoked in a separate process, and because mod_perl compiles and caches code that we write, our routine will run much faster than if it were a CGI program.
We must also remember to adhere to mod_perl programming conventions, the most important being to use lexical (“temporary” or “my”) variables as much as possible. Global variables stick around across invocations of mod_perl, which can lead to memory leaks and odd bugs. We ensure that we use “my” before variables with the use strict pragma at the top of our program.
Our module, Apache::StockProfile.pm (see Listing 1 in the archive file), has three stages: First, it initializes all of the variables and information, grabbing the current list of securities from StockValues and initializing the user's profile. Then, if the module was invoked with the POST method, it sets or modifies the user's profile information as necessary. Finally, it produces an HTML form that can be used to modify the profile further.
The first thing we do in “handler”, as with all mod_perl modules, is retrieve the Apache request object, traditionally known as $r. This object's methods allow us to retrieve and set everything having to do with the HTTP transaction. For instance, we can set outgoing headers with $r->header_out, the “Content-Type” header with $r->content_type, and send the final headers with $r->send_http_header.
However, certain things are more easily accomplished—at least to experienced CGI programmers—with CGI.pm, the standard module for CGI programming. We can get a version of that API by using and creating an instance of CGI::Apache. The created object gives us access to HTML form elements and debugging tools (including the invaluable dump method) using the familiar interface from CGI.pm. Not everything works in the same way, but it is good enough for almost all purposes.
Our main use of CGI::Apache in this program is to retrieve the HTML form elements, which are submitted via the POST method. StockProfile.pm both creates the form and handles its submission, which might seem odd at first but makes for a compact and easy-to-maintain type of code.
We retrieve a list of current symbols with a simple SELECT statement to the database. However, if we were to simply say “SELECT symbol FROM StockValues”, we would get one row for each value of a symbol or about five values per week, if we add one new value each day. In order to retrieve distinct values, we add the qualifier DISTINCT to our SELECT query. We also ask that the symbols be alphabetized, so that they will be listed in reasonable order:
my $sql = "SELECT DISTINCT symbol FROM StockValues"; $sql .= "ORDER BY symbol ";
We set the values of $name, $email and %portfolio based on the user's session information, as we discussed earlier. We then use this information to fill in the HTML form that allows the user to modify his or her profile. I prefer to use a table for such forms to ensure the columns line up straight, but that is simply a matter of aesthetics; the important thing is that each element must have its own, unique name and they will be the same names as our POST-handling code at the top of the program expects to use.
Our second mod_perl handler is StockReport.pm (see Listing 2 in the archive). This module uses the portfolio information the user has entered, and creates one or more graphs based on it.
If the user has a portfolio defined, then we iterate through each of the symbols in it. We then SELECT all of the rows in StockValues with that symbol, retrieving them in date order:
my $sql = "SELECT value,date FROM StockValues "; $sql .= "WHERE symbol = \"$symbol\" "; $sql .= "ORDER BY date ";
Now we iterate through each of the returned rows, adding the value to the @values array and the date to the @dates array. We also calculate the value of the user's holdings on that day (by multiplying the number of shares by the share price) and put that in the @holdings array.
We then plot our data set by creating a @data array, in which the elements are references to @dates, @values and @holdings. @dates will be used as the X axis in our graph, while @values and @holdings will be plotted as well. Because each element of @holdings is bound to be a multiple of its counterpart in @values, we tell GIFgraph to use two Y axes—one for values and one for holdings.
We create the graphs themselves using GIFgraph's plot_to_gif method, which takes a set of data points (the @data array, StockReport.pm), creates a graph in GIF format, then saves it to disk. We set the filename in a variable, so that we can both save the file and refer to it in an IMG tag. Remember, the file must be in the web document tree in order for it to be available to the user's web browser!
It might be tempting to put such files in /tmp, the standard temporary directory for Linux systems, but then the graphics will be unavailable to outside browsers. This directory must be writable by the web server, which often means making it open to more people than the rest of your web hierarchy. If this is the case on your system, make sure only this directory is writable by others, so that you don't run the risk of an intruder viewing or damaging your site's sensitive files.
Creating files in this way works well, but with one major flaw: it has the potential to fill your file system with numerous old graphics. A number of methods can be used to overcome this problem, but perhaps the simplest is to use cron to identify and delete any file older than a certain time. Depending on how busy your site is, you might want to run such a cron job every ten minutes, every hour, or once a month. It all depends on how many visitors you receive and how large your disk is. It is probably better to run such a deleting program more often, so as to avoid a denial-of-service attack that could fill your disks.
While I did not implement it in this version of StockReport, you can probably see how easy it would be to allow users to choose the range of dates in the graph. Using an HTML form, you could allow users to choose the starting and ending dates; the values of those form elements could then be inserted into the SQL query so as to SELECT just those rows between the named dates.
Once we have written and installed StockProfile.pm and StockReport.pm into our Perl module hierarchy, we must somehow tell Apache when to use them. We can do this in a number of ways, but my preference is to create special URLs that invoke these modules. That is, every time someone requests the URL “/stock-profile” from our server, they should get the profile editor. By the same token, when someone requests “/stock-report” from our server, they should see a report of their current stocks.
In order to accomplish this, we must first load each of the modules by adding the following two lines to the Apache configuration file, httpd.conf:
PerlModule Apache::StockProfile PerlModule Apache::StockReport
Once we have done that, we can create new URLs, which do not necessarily correspond with files on the server's file system. For this, we use <Location> sections in httpd.conf. We indicate that the URL in question should be handled by a Perl module (“SetHandler perl-script”), and then tell Apache which specific module to use for that particular URL:
<Location /stock-profile> SetHandler perl-script PerlHandler Apache::StockProfile </Location> <Location /stock-report> SetHandler perl-script PerlHandler Apache::StockReport </Location>You will need to restart Apache in order for these new URLs to work. If there is an error in one of the Perl modules, or if mod_perl cannot find one of the modules in the module path @INC, the restart of Apache will fail. This ensures you will not have any compile-time errors when your modules run under mod_perl. At the same time, it requires that you test your modules extensively before including them on a live site, since bringing down the server on a large web site can be embarrassing or financially difficult.
Last month, we saw how to create simple dynamic stock graphs using the GIFgraph package. This month, we saw how such dynamically created graphics can fit into a larger application, allowing users to see information about their stock portfolios. The combination of a relational database, mod_perl and GIFgraph makes it possible to create such a simple application in under 400 lines of code. You can undoubtedly think of many other applications in which dynamically created graphics would be useful—let your imagination go wild!
Reuven M. Lerner is an Internet and Web consultant living in Haifa, Israel. His book Core Perl will be published by Prentice-Hall later this year. Reuven can be reached at reuven@lerner.co.il. The ATF home page is at https://www.lerner.co.il/atf/.