The Sysadmin's Toolbox: sar

As someone who's been working as a system administrator for a number of years, it's easy to take tools for granted that I've used for a long time and assume everyone has heard of them. Of course, new sysadmins get into the field every day, and even seasoned sysadmins don't all use the same tools. With that in mind, I decided to write a few columns where I highlight some common-but-easy-to-overlook tools that make life as a sysadmin (and really, any Linux user) easier. I start the series with a classic troubleshooting tool: sar.

There's an old saying: "When the cat's away the mice will play." The same is true for servers. It's as if servers wait until you aren't logged in (and usually in the middle of REM sleep) before they have problems. Logs can go a long way to help you isolate problems that happened in the past on a machine, but if the problem is due to high load, logs often don't tell the full story. In my March 2010 column "Linux Troubleshooting, Part I: High Load" (https://www.linuxjournal.com/article/10688), I discussed how to troubleshoot a system with high load using tools such as uptime and top. Those tools are great as long as the system still has high load when you are logged in, but if the system had high load while you were at lunch or asleep, you need some way to pull the same statistics top gives you, only from the past. That is where sar comes in.

Enable sar Logging

sar is a classic Linux tool that is part of the sysstat package and should be available in just about any major distribution with your regular package manager. Once installed, it will be enabled on a Red Hat-based system, but on a Debian-based system (like Ubuntu), you might have to edit /etc/default/sysstat, and make sure that ENABLED is set to true. On a Red Hat-based system, sar will log seven days of statistics by default. If you want to log more than that, you can edit /etc/sysconfig/sysstat and change the HISTORY option.

Once sysstat is configured and enabled, it will collect statistics about your system every ten minutes and store them in a logfile under either /var/log/sysstat or /var/log/sa via a cron job in /etc/cron.d/sysstat. There is also a daily cron job that will run right before midnight and rotate out the day's statistics. By default, the logfiles will be date-stamped with the current day of the month, so the logs will rotate automatically and overwrite the log from a month ago.

CPU Statistics

After your system has had some time to collect statistics, you can use the sar tool to retrieve them. When run with no other arguments, sar displays the current day's CPU statistics:


$ sar
. . .
07:05:01 PM  CPU  %user  %nice  %system  %iowait %steal  %idle
. . .
08:45:01 PM  all   4.62   0.00     1.82     0.44   0.00   93.12
08:55:01 PM  all   3.80   0.00     1.74     0.47   0.00   93.99
09:05:01 PM  all   5.85   0.00     2.01     0.66   0.00   91.48
09:15:01 PM  all   3.64   0.00     1.75     0.35   0.00   94.26
Average:     all   7.82   0.00     1.82     1.14   0.00   89.21

If you are familiar with the command-line tool top, the above CPU statistics should look familiar, as they are the same as you would get in real time from top. You can use these statistics just like you would with top, only in this case, you are able to see the state of the system back in time, along with an overall average at the bottom of the statistics, so you can get a sense of what is normal. Because I devoted an entire previous column to using these statistics to troubleshoot high load, I won't rehash all of that here, but essentially, sar provides you with all of the same statistics, just at ten-minute intervals in the past.

RAM Statistics

sar also supports a large number of different options you can use to pull out other statistics. For instance, with the -r option, you can see RAM statistics:


$ sar -r
. . .
07:05:01 PM kbmemfree kbmemused %memused kbbuffers  kbcached  
kbcommit  %commit
. . .
08:45:01 PM    881280   2652840     75.06    355284   1028636   
8336664    183.87
08:55:01 PM    881412   2652708     75.06    355872   1029024   
8337908    183.89
09:05:01 PM    879164   2654956     75.12    356480   1029428   
8337040    183.87
09:15:01 PM    886724   2647396     74.91    356960   1029592   
8332344    183.77
Average:       851787   2682333     75.90    338612   1081838   
8341742    183.98

Just like with the CPU statistics, here I can see RAM statistics from the past similar to what I could find in top.

Disk Statistics

Back in my load troubleshooting column, I referenced sysstat as the source for a great disk I/O troubleshooting tool called iostat. Although that provides real-time disk I/O statistics, you also can pass sar the -b option to get disk I/O data from the past:


$ sar -b
. . .
07:05:01 PM    tps    rtps    wtps   bread/s   bwrtn/s
. . .
08:45:01 PM   2.03    0.33    1.70      9.90     31.30
08:55:01 PM   1.93    0.03    1.90      1.04     31.95
09:05:01 PM   2.71    0.02    2.69      0.69     48.67
09:15:01 PM   1.52    0.02    1.50      0.20     27.08
Average:      5.92    3.42    2.50     77.41     49.97

I figure these columns need a little explanation:

  • tps: transactions per second.

  • rtps: read transactions per second.

  • wtps: write transactions per second.

  • bread/s: blocks read per second.

  • bwrtn/s: blocks written per second.

sar can return a lot of other statistics beyond what I've mentioned, but if you want to see everything it has to offer, simply pass the -A option, which will return a complete dump of all the statistics it has for the day (or just browse its man page).

Turn Back Time

So by default, sar returns statistics for the current day, but often you'll want to get information a few days in the past. This is especially useful if you want to see whether today's numbers are normal by comparing them to days in the past, or if you are troubleshooting a server that misbehaved over the weekend. For instance, say you noticed a problem on a server today between 5PM and 5:30PM. First, use the -s and -e options to tell sar to display data only between the start (-s) and end (-e) times you specify:


$ sar -s 17:00:00 -e 17:30:00
Linux 2.6.32-29-server (www.example.net)  02/06/2012   _x86_64_
(2 CPU)

05:05:01 PM  CPU  %user  %nice %system %iowait  %steal  %idle
05:15:01 PM  all   4.39   0.00    1.83    0.39    0.00   93.39
05:25:01 PM  all   5.76   0.00    2.23    0.41    0.00   91.60
Average:     all   5.08   0.00    2.03    0.40    0.00   92.50

To compare that data with the same time period from a different day, just use the -f option and point sar to one of the logfiles under /var/log/sysstat or /var/log/sa that correspond to that day. For instance, to pull statistics from the first of the month:


$ sar -s 17:00:00 -e 17:30:00 -f /var/log/sysstat/sa01 
Linux 2.6.32-29-server (www.example.net)  02/01/2012   _x86_64_
(2 CPU)

05:05:01 PM  CPU  %user  %nice  %system  %iowait %steal  %idle
05:15:01 PM  all   9.85   0.00     3.95     0.56   0.00   85.64
05:25:01 PM  all   5.32   0.00     1.81     0.44   0.00   92.43
Average:     all   7.59   0.00     2.88     0.50   0.00   89.04

You also can add all of the normal sar options when pulling from past logfiles, so you could run the same command and add the -r argument to get RAM statistics:


$ sar -s 17:00:00 -e 17:30:00 -f /var/log/sysstat/sa01 -r
Linux 2.6.32-29-server (www.example.net)  02/01/2012   _x86_64_
(2 CPU)

05:05:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  
kbcommit  %commit
05:15:01 PM    766452   2767668     78.31    361964   1117696   
8343936    184.03
05:25:01 PM    813744   2720376     76.97    362524   1118808   
8329568    183.71
Average:       790098   2744022     77.64    362244   1118252   
8336752    183.87

As you can see, sar is a relatively simple but very useful troubleshooting tool. Although plenty of other programs exist that can pull trending data from your servers and graph them (and I use them myself), sar is great in that it doesn't require a network connection, so if your server gets so heavily loaded it doesn't respond over the network anymore, there's still a chance you could get valuable troubleshooting data with sar.

Toolbox image via Shutterstock.com.

Kyle Rankin is a Tech Editor and columnist at Linux Journal and the Chief Security Officer at Purism. He is the author of Linux Hardening in Hostile Networks, DevOps Troubleshooting, The Official Ubuntu Server Book, Knoppix Hacks, Knoppix Pocket Reference, Linux Multimedia Hacks and Ubuntu Hacks, and also a contributor to a number of other O'Reilly books. Rankin speaks frequently on security and open-source software including at BsidesLV, O'Reilly Security Conference, OSCON, SCALE, CactusCon, Linux World Expo and Penguicon. You can follow him at @kylerankin.

Load Disqus comments