At the Forge - Server Migration and Disasters

by Reuven M. Lerner

Experience counts for quite a bit in the world of system administration. Sysadmins scarred by bad hardware, devastating software problems and security break-ins are more likely to put together effective backup strategies, security policies and disaster recovery plans. The question is not whether a disaster will happen to your servers but when the disaster will occur and what form it will take.

I'm writing this in mid-August 2003, less than one week after I moved my own server (lerner.co.il) to a new virtual co-location facility. And, much of this column was written in blacked-out New York City, where I was planning to spend several hours at business meetings—and ended up getting to experience firsthand a large-scale technological disaster. Oh, and when I wasn't moving my server or sitting in the dark, I was without Internet connectivity for the better part of a week, as I was moving to a new apartment in Chicago.

So this month, we take a brief pause from our discussion of Bricolage and other content management software and instead consider how to handle server migrations and disaster plans for Web/database sites. Of course, every site and server is different and deserves to be given individual attention for the best possible planning. But with a little forethought, it shouldn't be too difficult to move your server from one location to another, to handle catastrophic hardware or software failure or even to run in the face of large-scale disaster, as the entire northeastern United States experienced this summer.

Server Relocation

I have moved my server several times over the last few years, and each experience has gone more smoothly than its predecessor. To be honest, moving to a new machine does not need to be difficult or painful, but it does need to be planned carefully. Every step needs to be taken with the assumption that you will need to roll it back at some point.

The simplest possible type of server to move from one machine to another is a static Web site or one that uses basic CGI programs. In such cases, you need to ask only a few questions:

  • Does the Apache configuration include the modules that you use? If you are a heavy user of mod_rewrite or if you enjoy the benefits of mod_speling (yes, with one l), you should double-check to ensure that these modules are available. If they were compiled into your server statically, running httpd -l lists them. If they were compiled as dynamic modules (DSOs), however, you should check in the libexec subdirectory of your Apache installation, where available DSOs are placed. Each DSO can be loaded optionally by including an appropriate LoadModule directive in the Apache configuration file.

  • Under what user and group does Apache run? System administrators have different opinions regarding the user and group IDs under which Apache should run. Some use the default, running it as the nobody user. Others (like me) prefer to create a special Apache user and group, adding users into the apache group as necessary. Still others use the suexec functionality in Apache, compiling it such that it can run as one or more users. In any case, be sure your chosen Apache user/group configuration on your new server is set up in /etc/passwd and /etc/group, as well as in Apache's own config files.

  • Where is the DocumentRoot? By default, Apache assumes that DocumentRoot is in /usr/local/apache/htdocs. This default, which can be changed with the DocumentRoot directive in the Apache configuration file, depends on the operating system or distribution you are running. If you are using an RPM version of Apache, as is the case with Red Hat-style distributions, the DocumentRoot might be in /var/www or in another directory altogether. This should not affect the URLs within your programs and documents, but you should double-check the directory into which you're copying the files before assuming that the destination is the right place.

  • On what languages and modules do your CGI programs depend? If your site uses CGI programs, then at least one of those programs probably depends on an external module or library of some sort. CGI.pm, the Perl module for CGI programs, has been included in Perl distributions for a number of years, but it continues to be updated on a regular basis. So, if you depend on features from the latest version, you must double-check. This goes for other modules that you use; one client of mine was using an old version of Perl's Storable module and discovered (the hard way) that upgrading to the latest version broke compatibility when communicating with legacy systems.

DNS

The linchpin of any server migration is the process of moving DNS records. Although people prefer to use names, such as www.lerner.co.il, network connections use numeric IP addresses, such as 69.55.225.93. Translating the human-readable names into computer-usable numbers is the role of DNS, the Domain Name System. Intelligent manipulation of DNS records is a critical part of any server transfer.

The main problem with DNS is not host-to-IP translation but, rather, the fact that DNS results are cached. After all, you want to avoid a DNS request to your server for each HTTP request that someone makes. Such requests would place undue load on your server and would unnecessarily delay the servicing of HTTP requests.

So when you make a DNS request, you're not actually asking the original, authoritative server for an answer. Rather, you're asking your local DNS server for an answer. If it can provide one from its cache of recent results, it will do so without turning to the main server. In other words, nslookup www.lerner.co.il executes a DNS request against your ISP's DNS server. That server might return a result from its cache, or it might turn to the authoritative server for the lerner.co.il domain.

When you move a server from one machine to another, then, you want to reduce the TTL (time to live) setting on the DNS server to a low number, so that DNS servers caching this information do not return false answers. I've found that reducing the TTL to 300 seconds (five minutes) is more than adequate. Once the system has been migrated completely, you can increase the TTL to a more typical value, such as six hours, to reduce the load on your DNS servers.

If you are moving your HTTP server from one provider to another, here is an outline of what you can do to have a successful migration:

  • Make sure a DNS server at your new provider is able and willing to serve DNS (forward and backward) with your current IP addresses and hostnames. That is, the DNS server at your new provider should point people to your old provider. Set the TTL to five minutes.

  • Update the WHOIS records for your domain, indicating that your new provider is the authoritative DNS server. It may take one or two days for this to filter through the entire DNS system. If your new DNS server is providing results identical to your old one, the only ways to tell if things have worked are to perform a WHOIS lookup or to use nslookup -type=ns yourdomain.com.

  • Once the WHOIS records have been updated, start moving things over. Make sure that all of the software you need is configured correctly, that all modules are set correctly and that the DNS servers have been updated. If your new DNS server isn't responding to queries for your domain, you will be in deep trouble when the WHOIS records point to the new server as an authority.

  • When everything seems to be identical (running rsync from the old system to the new one is a good way to ensure that it is), switch the DNS definitions such that the hostname resolves to the new IP address, rather than to the old one.

Depending on the type of server you're running, you might want to turn off the HTTP server on the old system to reduce some of the confusion that might occur as a result of the switchover. For example, switching off the old HTTP server before you switch DNS ensures that the log files do not have any overlap, allowing you to append them together and use Webalizer or Analog to look around appropriately.

At this stage, everything should be working correctly on the new system. But, you should check as many links as possible, particularly those that invoke CGI programs, server-side includes and nonstandard modules or those that require unusual permissions. As always, your HTTP server's error log is your best friend during this process; if and when things go wrong, you can consult the error log to see what is happening.

Databases

Of course, all of the above assumes that you're working on a relatively simple site. Most modern Web sites, however, include relational databases somewhere in the mix for a variety of reasons. They are used for consistency and flexibility, as well as for development speed and the incorporation of a commonly used paradigm that's easy to debug and use.

Relational databases store information in one or more tables, commonly grouped into a database. (Yes, it's somewhat confusing to say that a database server contains one or more databases, each of which in turn contains one or more tables, but that's the way it is.) To move a database from one system to another, you need to move both the database schema (the table, view and function definitions) and the data itself. Of course, a database typically is owned by a particular database user (who generally is not the same as a UNIX user) and has specific permissions set on it.

If your Web site incorporates a database, you thus need to move that database from the old system to the new one, including all of the owner- and permissions-related information. How you do this depends on the database you're using and if any tricks are involved.

MySQL databases that use ISAM/MyISAM tables (the default and still the most popular option) simply can be copied from one MySQL system to another. Typically, all of the files associated with a database are in a directory with the name of the database under /var/lib/mysql. You thus can copy the foo database by copying the directory /var/lib/mysql/foo, including any files it might contain, into /var/lib/mysql/foo on the new system. (Be sure the database is shut before performing this operation.) Start up the server on the new system, and it should work fine.

Things aren't quite this easy with PostgreSQL, which keeps track of database schemas and data in a low-level binary format. Using tar or rsync to copy a PostgreSQL database is highly unlikely to work—and if it does work, it probably involves massive data corruption and might even crash the back-end database server. Instead, you should use the pg_dump tool, which allows you to turn a PostgreSQL database into a set of CREATE, INSERT and COPY statements in plain-text format. For example:

pg_dump -U mydb mydb > /tmp/mydb-dump.txt

The -U mydb portion indicates that we want to use the mydb database user. You might need to substitute a different user name. You then can put this dumped output into a working database with:


$ createdb -U mydb mydb2
$ psql -U mydb mydb2 < /tmp/mydb-dump.txt

Following these two commands, two databases (mydb and mydb2) are available, both owned by the mydb user.

MySQL makes it easy to handle such situations because of its built-in master/slave system. One database can be the master, accepting all SQL commands and queries, and another can follow along, allowing you to replace the master with the slave in case of catastrophic failure.

In Case of Blackout...

As I mentioned earlier, I was “lucky” enough to be in New York City during the massive power outage that affected millions of people in the United States and Canada this past August. Ironically, the outage occurred about an hour after I toured a potential client's server farm, where he demonstrated how his company backs up all its data to a remote location in Connecticut. (After all, what are the odds of something affecting both New York City and Connecticut? At least, that's what I was thinking while nodding in approval an hour before the blackout.)

If you're like me, the blackout didn't affect your servers, which are located outside of New York. And most co-location facilities have backup generators that can handle most or all of the facility's power needs in case of an emergency.

But, if your servers are located in your office or if you use only a simple UPS to ensure they continue running, the servers probably would be unavailable during a blackout like the one that we saw in mid-August, which lasted more than 48 hours in some parts of the Northeast. If your server is critical to your business, you might seriously want to consider moving it to a co-location facility.

But even co-located servers crash and go off-line; I can tell you this from personal experience over the past few years. This means that if you depend on your server, you should be backing it up on a regular basis. Moreover, you should be migrating it continuously to a server at another physical location, preferably hosted by a different company. But the differences should stop there—you want the software configurations to be as similar as possible. If you use rsync for all of the HTML pages, templates, software libraries and CGI programs on your server, and similarly automate a database dump and restore on the second server, the second server should be a close-to-exact replica of the first and should be available to go live at a moment's notice.

You even can go one step further and use both servers simultaneously. Of course, this is a far more difficult task, as it requires you either to use a single database server (creating a single point of failure) or to synchronize the databases at frequent intervals. But it's definitely possible, particularly on sites that are largely static, as we know from the success of Akamai, which has a hugely redundant array of servers all over the world. The more static the site, the easier it is to replicate and keep up and running.

This is still one advantage that commercial database software, such as Oracle, has over PostgreSQL and MySQL. Although the latter offer master/slave replication to varying degrees, the synchronization is neither as advanced nor as robust as what Oracle offers. Over time, however, I expect this situation will change, as such architectures grow increasingly common.

Conclusion

Moving to a new place is hard, and moving your server to a new location or computer also is hard. But by coming up with a good migration plan, moving incrementally and checking your work at every point with tools such as nslookup, dig, telnet and the HEAD and GET programs that come with Perl's LWP (or the curl toolkit that performs similar operations), you can have a smooth and simple migration.

With only a few changes, a migration plan also can be used as a backup plan, ensuring that your servers continue to work and are accessible even in the wake of a great disaster. You cannot plan for every potential pitfall, but if your organization depends on its Web site, it's worth investing the time and money to ensure that it remains on-line.

Reuven M. Lerner, a longtime Web/database consultant and developer, is now a first-year graduate student in the Learning Sciences program at Northwestern University. He lives with his wife and two young daughters in Chicago, Illinois. You can reach him at reuven@lerner.co.il.

Load Disqus comments