What Does "Fast" Mean?

Good news! One of my clients is launching a new marketing campaign, which we expect will make the business even more successful than before.

Bad news! This means our Web application, which has existed for some time on a fairly simple infrastructure, and which has handled a steadily growing number of users, now (we hope) will need to deal with a massive spike in users.

The big question is this: can our servers handle the load we expect? Indeed, what load can we expect? And, what happens if we need to crank up even more capacity?

So in this article, I walk through some of the basic points having to do with Web scalability, describing a few of the key things to keep in mind. Next month, I'll take a deeper dive into these ideas and discuss some of the techniques you can use to improve the speed—or apparent speed—of your applications.

Background

Many of my clients are companies that need a Web application, but aren't familiar with the ways in which the Web works. A common question for them to ask me is "We have many thousands of users every month. Can the server handle that many people?" When I explain that users consume server resources only when they actively are making an HTTP request, their understanding begins to improve. A company with 10,000 visitors a month doesn't need to worry about 10,000 simultaneous visitors; they likely will have some periods of time with a few dozen and other periods of time with absolutely none. Thus, scaling up their infrastructure to handle 10,000 simultaneous users would be foolish.

At the same time, there are times—such as after launching an advertising campaign or being mentioned on a TV show—that you indeed will have a huge spike in traffic. Companies that advertise during the Super Bowl not only expect to get millions of viewers, they also expect to have many of those people visit their Web sites after (or during) watching the ads. This means normal assumptions for scaling no longer are applicable.

This is one of the reasons why Amazon's EC2 has become so popular. If you can treat a server as a commodity, paying for it by the hour and spinning servers up and down as necessary, you can solve this scaling problem. As traffic rises, you add more servers. As it falls, you remove them.

But of course, life is more complicated than that. First and foremost, every system has bottlenecks that can't just be wished away by auto-scaling. For example, if it turns out that your database can't handle a large load and you have only a single database server, auto-scaling your Web servers may exacerbate the problem, rather than solve it.

Second, although it's nice to imagine infinite budgets for auto-scaling servers, it's probably a bit more realistic to think not just about increasing the number of servers, but also about making each individual server more efficient. If there are ways to improve the efficiency of your code, that's often a good place to work on scaling, before throwing (virtual) hardware at the problem.

Third, if you're in charge of a site's technical infrastructure, your answer to the "how many people can we serve simultaneously" question probably should not be "it's infinite, assuming an infinite budget". The technical staff might like that answer, but the company's CFO might have a bit of an issue with giving the IT department a blank check.

What Is Speed?

Many non-technical people will say "I want to have a fast Web site." From a technical perspective, however, that's not a very useful statement, because it neither differentiates between the different types of speed, nor does it consider the multiple layers involved in a modern Web application, nor does it take into consideration multiple people and the crunch that comes from a sudden surge of interest in the site.

So, let's consider the many different parts of a Web application and how each of them might affect the speed.

Speed

It's true that networks can have different speeds. In general, people describe this in terms of bandwidth, which doesn't really mean that the electrons (or photons) are moving through the wires (or fibers or air) any faster, but rather that more of them are pushing through, in parallel, at a time. You can think of bandwidth as a straw through which you're trying to drink your favorite cold beverage. Two straws will allow you to drink twice as much at the same time and, thus, drink more quickly, even if the speed with which liquid flows through each straw is the same.

One of the potential problems with using shared servers, and with using a virtual machine on shared hardware, is that the network capacity is being divided among the many users. Think of what would happen if several people were to share your drinking straw from the previous example. Sure, the overall straw might be the same size, but each individual gets less than the full bandwidth. You don't need a virtual machine to see such effects either—just try to run several network-intensive applications on the same computer, and you'll quickly find that they are competing for network resources.

The upshot here is that you want to maximize the bandwidth available to your server. This means having your own server—even if it's a VM, you probably don't want it sharing resources with other VMs—and putting different services on different computers.

Latency

This term also has to do with speed, but in a different way from pure bandwidth. Let's say you want to transfer data between two huge servers, so you put a huge, high-speed wire between those networks. You would say that such a network has high bandwidth but also low latency, since the signals would go between the two via a high-speed wire.

Let's now replace that high-speed wire with a satellite link. Suddenly, because it takes time to send the data from one network to another, you have decreased your latency while keeping your bandwidth identical. The network speed hasn't changed, but loading each page now will take significantly longer. One of the major considerations of a Web application is latency—of the networks on which the server is running, but also of the application itself. If it takes several seconds for a server to reply, you can say that the application has high latency. This not only frustrates users (who have to wait for a response from the server), but it also means that a larger number of processes are running on the server at the same time, consuming resources. Thus, reducing latency in a Web application is in the best interest of users and of the company.

Client-Side Wait Time

Many people, even those who have been using the Web for years, don't understand that a single Web page is usually the result of dozens, and sometimes even hundreds, of different files—often from different servers. Of course, there's the HTTP response from the Web server, but that might (will) then refer to JavaScript, CSS and static files that might reside in a variety of places. JavaScript is a particularly well known culprit in this arena, in that sites increasingly are downloading JavaScript from such sites as Google Analytics, Optimizely, Facebook and the like.

The problem is that in order to display the complete Web page, your browser needs to retrieve all of those individual pieces. Thus, one delayed image or one delayed CSS file can cause the wait on the user's side to be frustratingly long. Note that this has only partly to do with the bandwidth and latency on the server. If your Web app responds lightning-fast, but tells the user's browser to download a JavaScript file from a very slow server, then from the user's perspective, things might take a very long time.

This means you need to think about performance in new and different ways from what you might have before. It's not enough to push all of the files to the user's browser or to indicate from which sites the user's browser can retrieve them. You also need to think about where they are loaded. A <script> tag at the top of the page can have very different performance characteristics than at the bottom of the page, since browsers interpret and render tags from top to bottom.

Client-Side Performance

As if all of that weren't enough, it is increasingly becoming an era of rich client-side Web applications. Regardless of whether you're using something as simple as Backbone or as complex as Ember.js, you are writing software that will be executing inside the user's browser. For its first two decades, the Web was highly biased in favor of the server, which also made it easier to scale, profile, debug and improve programs. But, now that programs are running inside different browsers, on users' computers, there is much more to think and worry about.

A very small JavaScript program might allocate lots of memory and/or take a long time to run. Or, a very large JavaScript program might be straightforward, affecting the in-browser performance very little. I've increasingly found my browser to be consuming a huge proportion of my computer's CPU—not because I'm doing so much, but rather because lots of JavaScript is executing there.

What Does This All Mean?

Web development used to seem so simple. You get a domain, set up a server, slap together some software, and you're in business. And indeed, you often still can do that today. But, if you're expecting to get lots of visitors at once, you need to understand the different types of "fast" that you'll need to consider, measure and then optimize.

Next time, I'll dig deeper into each of these types of speed, looking at specific parts of your software that can affect each of them. I'll give some specific suggestions for how you can identify such issues, as well as solve them, particularly if you have some low-hanging fruit.

Reuven M. Lerner, a longtime Web developer, offers training and consulting services in Python, Git, PostgreSQL and data science. He has written two programming ebooks (Practice Makes Python and Practice Makes Regexp) and publishes a free weekly newsletter for programmers, at https://lerner.co.il/newsletter. Reuven tweets at @reuvenmlerner and lives in Modi’in, Israel, with his wife and three children.

Load Disqus comments