At the Forge - Memcached Integration in Rails

by Reuven M. Lerner

Last month, we talked about memcached, a distributed caching system that is in widespread use among Web sites. The reason for memcached's popularity is its simplicity. With a minimum of overhead and setup, it's possible to set and retrieve nearly any value. Caching values that otherwise would come from the database makes it possible to avoid the database altogether on many occasions, speeding the throughput of a Web application and reducing the load on the database server.

Memcached is a wonderful tool, and it is something nearly every Web developer should have in his or her arsenal to improve site performance. But with the release of Ruby on Rails 2.1, it got even better. Rails now has integrated support for memcached, allowing you to use it almost for free from within your application. There are some caveats and tricks to its use, but once you have those under your belt, you quickly will discover that memcached has improved your site performance dramatically.

This month, we take a look at how to make memcached work inside your Rails applications. We further explore some issues you might encounter when using memcached, some of which are easier to work around than others.

Cache Integration

Ruby on Rails has, since its inception, tried to make Web developers' lives easier by coming out with many tools such developers might need. It comes with an excellent object-relational mapper (ORM), ActiveRecord. It comes with a way to test your code at a variety of different levels (called, in Rails-speak, unit, functional and integration). It comes with a first-class JavaScript library and associated effects, in Prototype and Scriptaculous. As numerous demonstrations and tutorials have shown, Rails allows you to jump right in to Web development, writing and testing your code with a minimum of dependencies. If you need to include some functionality that was left out by the Rails authors, it's not very difficult to include a Ruby gem (downloadable library) or even a “plugin” that sits inside your Rails application.

Rails has long come with a multilayered caching system that programmers can tap to speed up applications. You can cache individual pages, controller actions or even page fragments. And indeed, judicious use of the Rails caching commands can result in serious improvements to performance.

But, it was only in version 2.1 that Rails integrated support for caching individual objects. The support for object caching not only has the potential to improve your application's performance dramatically, but it also allows you to work with a variety of different storage facilities, so you can choose the one that's most appropriate for you. Although this article concentrates on the use of memcached, you should know that it's possible to work with not only memcached, but also with caches on the local filesystem, in local memory or even on another Rails-aware server using DRb (distributed Ruby, available as a Ruby gem).

Caching a Simple Object

To demonstrate how to use memcached, I'm going to create a simple Rails application, using PostgreSQL as the database:

createdb atf
rails --database=postgresql atf

Next, I create a simple object, person, for my application, with the Rails built-in scaffolding that includes a RESTful interface:

./script/generate scaffold person firstname:string 
 ↪lastname:string email_address:string

To import this definition into the database, I run the migration that it created:

rake db:migrate

Sure enough, if I connect to the database, I can see that the table has been created (Listing 1).

Listing 1. Example Table

atf_development=# \d people
                      Table "public.people"
 Column       |            Type             | Modifiers
--------------+-----------------------------+-----------------------------------
id            | integer                     | not null default nextval
                                              ↪('people_id_seq'::regclass)
firstname     | character varying(255)      |
lastname      | character varying(255)      |
email_address | character varying(255)      |
created_at    | timestamp without time zone |
updated_at    | timestamp without time zone |
  Indexes:
      "people_pkey" PRIMARY KEY, btree (id)

And, if I run the application, I have access (via the RESTful interface) to the various CRUD functions associated with a Person object: Create, Retrieve, Update and Delete. I simply type:

./script/server

And, I point my Web browser to port 3000 on my server: https://atf.lerner.co.il:3000/people/.

So far, so good. With a few commands on the UNIX command line, I've managed to create a simple database of people. I'll use the scaffolded application to add several people, clicking on the New person link and then adding the first name, last name and e-mail address of each of my friends.

Now, if I look at the Rails development log, I easily can see that each act I perform from within the scaffolded environment results in an SQL query being built and sent to the PostgreSQL server. I often do this by typing:

tail -f log/development.log

For example, if I click on the show link for the first person I created, I see the following in the development log:

Person Load (0.001571)   SELECT * FROM "people" 
 ↪WHERE ("people"."id" = 1)

In other words, Rails knows that I want to load a Person object. It also knows that I retrieve such objects from the database. This is where ActiveRecord steps in, turning the Ruby:

Person.find(1)

into:

SELECT * FROM people WHERE people.id = 1

As you can imagine, it's not a big deal to do this sort of simple query, particularly if you have a limited number of fields, a small data set and a well-indexed primary key. But, as the number of fields increases, you might find yourself wanting to reduce the load on the database. Moreover, modern dynamic Web sites might need to retrieve 5–10 different objects from the database, only some of which are particular to the current user. If you get even 1,000 visitors to your site each day, and if there are three objects on each page that could be cached, that's 3,000 database queries you are foisting upon your database unnecessarily.

Memcached is an obvious solution to this problem. With previous versions of Rails, you needed to use a plugin or Ruby gem to do that. Now, however, you can do it via a configuration file. The gem that you previously needed to install, memcached-client, now is included along with the Rails gem. Every Rails application contains a main configuration file (config/environment.rb), which allows you to configure your application using Ruby code. This is where you should put configurations that are common to all three standard Rails environments: development, testing and production. For configurations that are specific to one environment, you instead would modify config/environments/ENV.rb, where ENV should be replaced with the environment of your choice.

Because we're still developing our example application, and using the development environment, we can confine our changes to config/environments/development.rb. Open that file in the editor of your choice, and add the following line:

config.cache_store = :mem_cache_store

This tells Rails that you want to use memcached and that the server is on the local computer (localhost), using the default port 11211. However, you can override these, and even put things into a separate namespace, if you're worried about stepping on someone else's objects.

When you're working in development mode, you also need to tell the server to use caching, a parameter that is set (and false) by default:

config.action_controller.perform_caching = true
Caching Objects

Now, let's go in and modify the GET action within the controller that was built for us by the scaffolding system. (The built-in caching is designed to be used from controllers and views, rather than from models.) That'll be:

app/controllers/people_controller.rb

On line 16 of that file, you'll see:

@person = Person.find(params[:id])

This is obviously where we invoke Person.find, as shown in the logs earlier. Now, modify that line so it looks like this:

@person = cache(['Person', params[:id]]) do
  Person.find(params[:id])
end

We still are assigning a value to @person. And, our call to Person.find is still in there. However, Person.find now is buried within a block. And, that block is attached to the call to a cache function, which is given an array argument.

What's happening here is actually fairly straightforward. The cache function looks in the cache for its argument, which is turned into a key. If a value for this key exists in the cache, the value is returned. If not, the block is executed, with the result of executing the block stored in the cache and returned to the caller.

With this code in place, let's retrieve person #1 again and look at the logfile. The first time we do this, the value is indeed retrieved from the database, as before:

Person Load (0.002212)   SELECT * FROM "people" 
 ↪WHERE ("people"."id" = 1)

That line is followed by this new entry:

Cache write (will save 0.01852): controller/Person/1

Sure enough, our memcached server reports:


<7 new client connection
<7 get controller/Person/1
>7 END
<7 set controller/Person/1 0 0 224
>7 STORED

In other words, our Rails controller did exactly as we asked. It contacted memcached and asked for the value of controller/Person/1. (We can see from this that controller is prefaced to the key name that we create, and that elements of the cache key array are separated by slashes.) When we get a null value back for that, Rails retrieves the value from the database and then issues a set command in memcached, storing our value.

As you might expect, we then can refresh our browser window and see that we are saving a great deal of database time by retrieving information about this person from the cache. So, we refresh the browser window, and...boom! Our application blows up on us, with an error message that looks like this:

undefined class/module Person

Now, the first time this happened to me, I wasn't sure what hit me. What do you mean, I asked my computer, you don't know how to find a Person class? A little head-scratching and Google searching later, and I found my answer. I needed to tell the controller to load the object definition by putting the following at the top of my controller:

require_dependency 'person'

This is apparently necessary only in development mode, and it has something to do with the way Rails reloads classes while you are developing your application. With that line in place, you can reload the page. In the logfile, you'll see no trace of a successful call to the database. Instead, you'll find the following:

Cache hit: controller/Person/1 ({})

Meanwhile, our memcached log will look like this:


<7 get controller/Person/1
>7 sending key controller/Person/1
>7 END

This is a good time to mention the only other gotcha I can think of: whitespace is forbidden in memcached keys. This can be a problem if you use a value from the database (for example, a parameter name) as the key when storing things in memcached. The simple solution is to remove the whitespace, either by running String#gsub on each of the keys or by monkey-patching String (as I did for an application I wrote) to add a to_key method. I could then pass "parameter name".to_mkey as an argument to cache().

Expiration

Now, it's all well and good that we have cached information about each person in memcached. Our database certainly will thank us for that. But, what happens when data about the person changes? The way we've written this application, we're out of luck. Updated information will make its way to the database, but the cache will continue to give us the data it stored long ago. Even if this weren't the case, we still would want to empty the cache on occasion, allowing data to expire if we haven't used it in a while.

To solve the second problem, we can invoke our cache function in a slightly different way, indicating how long we want it to stick around in a second (and optional) argument:

@person = cache(['Person', params[:id]],
                :expires_in => 30.minutes) do
  Person.find(params[:id])
end

The :expires_in parameter accepts a number of seconds, which we either can enter by hand or via one of the super-convenient Rails extensions to the Fixnum class.

The second problem, one of expiring data manually, requires that we use a less beautiful, but also convenient, way of accessing the cache storage system:

Rails.cache.delete(['controller', 'Person', 
 ↪params[:id]].join('/'))

Basically, we access the cache system using the Rails.cache object and invoke the delete method on it. That method accepts a memcached key. As you might remember, we previously saw that the elements of our key array (as used by the helpful cache method) were joined by slashes and prefixed with controller. Thus, the above works, even though it's not quite as nice as I might have liked. We can see that this is the case in the memcached logs:


<7 delete controller/Person/1 0
>7 DELETED

And, sure enough, we then find that our next invocation of show for person 1 retrieves the information from the database and caches it in memcached.

Conclusion

Caching has long been an excellent way to improve performance in the computer industry, from the hardware level all the way up to operating systems and applications. Rails programmers have incorporated memcached into their applications over the last few years, but I believe that its complete integration in version 2.1 will make it even easier, and more widespread, to find memcached-enabled Rails applications. As you can see, adding just a few lines of configuration and application code can speed up an application by many times, without having to sacrifice accuracy.

Resources

If you are looking for information on memcached, you should begin at www.danga.com/memcached, the home page for the open-source project and the source of a great deal of good documentation, code and general information.

For information on Ruby on Rails, start by going to www.rubyonrails.com, which has pointers to documentation, mailing lists and (of course) software you can download.

For information on the integration of memcached into Rails, try www.thewebfellas.com/blog/2008/6/9/rails-2-1-now-with-better-integrated-caching.

There are some Rails plugins that might make it even easier to cache objects. For example, take a look at www.inwebwetrust.net/post/2008/09/08/query-memcached and lucaguidi.com/pages/cached_models, both of which have gained some attention since Rails 2.1 caching was released.

Finally, a tutorial on the use of memcached with Rails is included in a chapter of Advanced Rails Recipes, published by the Pragmatic Programmers. I have greatly enjoyed this book and recommend it to anyone planning to use Rails for more than a simple application. The chapter on memcached is one that has been released as a free sample, and it is available in PDF as media.pragprog.com/titles/fr_arr/cache_data_easily.pdf.

Reuven M. Lerner, a longtime Web/database developer and consultant, is a PhD candidate in learning sciences at Northwestern University, studying on-line learning communities. He recently returned (with his wife and three children) to their home in Modi'in, Israel, after four years in the Chicago area.

Load Disqus comments