Beyond Your First Shell Script

by Brian Rice

So here it is—your first shell script:

lpr weekly.report
Mail boss < weekly.report
cp weekly.report /floppy/report.txt
rm weekly.report

You found yourself repeating the same few commands over and over: print out your weekly report, mail a copy to the boss, copy the report onto a floppy disc, and delete the original. So it was a big time saver when someone showed you that you can place those commands into a text file (“dealwithit”, for instance), mark the file as executable with chmod +x dealwithit, and then run it just by typing its filename.

But you'd like to know more. This script you've written is not very robust; if you run it in the wrong directory, you get a cascade of ugly error messages. And the script is not very flexible either—if you'd like to print, mail, backup, and delete some other file, you'll have to create another version of the script. Finally, if someone asks you what kind of shell script you've written—Bourne? Korn? C shell?—you can't say. Then read on.

Last question first: what kind of shell script is this? Actually, the script above is quite generic. It uses only features common to all the shells. Lucky you. As your shell scripts get more complex, you'll need to put a directive at the beginning to tell the operating system what sort of shell script this is.

#!/bin/bash

The #! should be the first two characters of the file, and the rest should be the complete pathname of the shell program you intend this script to be run by. Astute readers will note that this line looks like a comment, and, since it begins with a # character, it is one, syntactically. It's also magic.

When the operating system tries to run a file as a program, it reads the first few bytes of the file (its “magic number”) to learn what kind of file it is. The byte pattern #! means that this is a shell script, and that the next several bytes, up to a newline character, make up the name of the binary that the OS should really run, feeding it this script file.

Paranoid programmers will make sure that no spaces are placed after the executable name on the #! line. You are paranoid, aren't you? Good. Also, notice that running a shell script requires that it be read first; this is why you must have both read and execute permission to run a shell script file, while you need only execute permission to run a binary file.

In this article, we will focus on writing programs for the Bourne shell and its descendants. A Bourne shell script will run flawlessly not only under the Bourne shell, but also under the Korn shell, which adds a variety of features for efficiency and ease of use. The Korn shell itself has two descendents of its own: the POSIX standard shell, which is virtually identical to the Korn shell; and a big Linux favorite, the Bourne Again shell. The Bourne Again shell (“bash”) adds mostly interactive features from the descendants of the C shell, Bill Joy's attempt to introduce a shell that would use C-like control structures. What a great idea! Due to a few good reasons, most shell programming has followed the Bourne shell side of the genealogical tree. But people love the C shell's interactive features, which is why they too were incorporated into the Bourne Again shell.

Let me rephrase: do not write C shell scripts. Continue to use the C shell, or its descendant tcsh, interactively if you care to; your author does. But learn and use the Bourne/Korn/bash shells for scripting.

So here is our shell script now:

#!/bin/bash
lpr weekly.report
Mail boss < weekly.report
cp weekly.report /floppy/report.txt
rm weekly.report

If we run this script in the wrong directory, or if we accidentally name our file something other than “weekly.report”, here's what happens:

lpr: weekly.report: No such file or directory
./dealwithit: weekly.report: No such file or directory
cp: weekly.report: No such file or directory
rm: weekly.report: No such file or directory

and we get a bunch of “Permission denied” messages if we run the script when the permissions on the file are wrong. Bleah. Couldn't we do a check at the beginning of the program, so that if something is wrong we can avert all these ugly messages? Indeed we can, using (surprise!) an if.

It occurs to us that if cat weekly.report works, so will most of the things our script wants to do. The shell's if statement works just as this thought suggests: you give the if statement a command to try, and if the command runs successfully, it will run the other commands for you too. You also can specify some commands to run if the first command—called the “control command”--fails. Let's give it a try:

#!/bin/bash
if
    cat weekly.report
then
    lpr weekly.report
    Mail boss < weekly.report
    cp weekly.report /floppy/report.txt
    rm weekly.report
else
  echo I see no weekly.report file here.
fi

The indentation is not mandatory, but does make your shell scripts easier to read. You can put the control command on the same line as the if keyword itself.

This new version works great when there's an error. We get only one “No such file or directory” message, an improvement over four, and then our helpful personalized error message appears. But the script isn't so hot when it works: now we get the contents of weekly.report dumped to the screen as a preliminary. This is, after all, what cat does. Couldn't it just shut up?

You may know something about redirecting input and output in the world of Unix: the > character sends a command's output to a file, and the < character arranges for a command to get its input from a file, as in our Mail command. So if only we could send the cat command's output to the trash can instead of to a file... Wait! Maybe there's a trash can file somewhere. There is: /dev/null. Any output sent to /dev/null dribbles out the back of the computer. So let's change our cat command to:

cat weekly.report >/dev/null

Because you are paranoid, you may be wondering whether sending output to the trash can will affect whether this command succeeds or fails. Since /dev/null always exists and is writable by anyone, it will not fail.

Now our script is much quieter. But when cat fails, we still see the

cat: weekly.report: No such file or directory

error message. Why didn't this go into the trash can too? Because error messages flow separately from output, even though they usually share a common destination: the screen. We redirected standard output, but said nothing about errors. To redirect the errors, we can:

cat weekly.report >/dev/null 2>/dev/null

Just as > means “Send output here,” 2> means “Send errors here.” In fact, > is really just a synonym for 1>. Another, terser way to say the above command is this:

cat weekly.report >/dev/null 2>&1

The incantation 2>&1 means “Send errors (output stream number 2) to the same place ordinary output (output stream number 1) is going to.” By the way, this 2> jazz only works in the Bourne shell and its descendants. The C shell makes it annoying to separate errors from output, which is one of the reasons people avoid programming in it.

You may be saying to yourself: “This cat trick is fun, but isn't there some way I can just give a true-or-false expression? Like, either the file exists and is readable, or not?” Yes, you can. There is a command whose whole job is to succeed or fail depending on whether the expression you give is true or not: test. This is why your test programs called “test” never work, by the way. Here is our program, rewritten to use test:

#!/bin/bash
if
    test -r weekly.report
then
    lpr weekly.report
    Mail boss < weekly.report
    cp weekly.report /floppy/report.txt
    rm weekly.report
else
    echo I see no weekly.report file here.
fi

The test command's -r operator means, “Does this file exist, and can I read it?” test is quiet regardless of whether it succeeds or fails, so there's no need for anything to get sent to /dev/null.

Test also has an alternative syntax: you can use a [ character instead of the word test, so long as you have a ] at the end of the line. Be sure to put a space between any other characters and the [ and the ] characters! We can make our if look like this now:

if [ -r weekly.report ]

Hey, now that looks like a program! Even though we're using brackets, this is still the test command. There are lots of other things test can do for you; see its man page for the complete list. For example, we seem to recall that what lets you delete a file is not whether you can read it, but whether the directory it sits in gives you write permission. So we can re-write our script like this:

#!/bin/bash
if [ ! -r weekly.report ]
then
    echo I see no weekly.report file here.
    exit 1
fi
    if [ ! -w . ]
then
    echo I will not be able to delete
    echo weekly.report for you, so I give up.
    exit 2
fi
# Real work omitted...

Each test now has a ! character in it, which means “not”. So the first test succeeds if the weekly.report is not readable, and the second succeeds if the current directory (“.”) is not writable. In each case, the script prints an error message and exits. Notice that there's a different number fed to exit each time. This is how Unix commands (including if itself!) tell whether other commands succeed: if they exit with any exit code other than 0, they didn't. What each non-zero number (up to 255) means, other than “Something bad happened,” is up to you. But 0 always means success.

If this seems backwards to you, give yourself a cookie. It is backwards. But there's a good design reason for it, and it's a universal Unix-command convention, so get used to it.

Notice also that our real work no longer has an if wrapped around it. Our script will only get that far if none of our error conditions are detected. So we can just assume that all those error conditions are not in fact present! Real shell scripts exploit this property ruthlessly, often beginning with screenfuls of tests before any real work is done.

Now that we've made our script more robust, let's work on making it more general. Most Unix commands can take an argument from their command lines that tells them what to do; why can't our script? Because it has “weekly.report” littered all through it, that's why. We need to replace weekly.report with something that means “the thing on the command line.” Meet $1.

#!/bin/bash
if [ ! -r $1 ]
then
    echo I see no $1 file here.
    exit 1
fi
if [ ! -w . ]
then
    echo I will not be able to delete $1 for you.
    echo So I give up.
    exit 2
fi
lpr $1
Mail boss < $1
# and so forth...
exit 0

$1 means the first argument on the command line. Yes, $2 means the second, $3 means the third, and so on. What's $0? The name of the command itself. So we can change our error messages so that they look like this:

echo $0: I see no $1 file here.

Ever noticed that Unix error messages introduce themselves? That's how.

Unfortunately, now there's a new threat to our program: what if the user forgets to put an argument on the command line? Then the right thing for $1 to have in it would be nothing at all. We might be back to our cascade of error messages, since a lot of commands, such as rm, complain at you if you put nothing at all on their command lines. In this program's case, it's even worse, since the first time $1 is used is as an argument to test -r, and test will give you a syntax error if you ask it to test -r nothing at all. And what does lpr do if you put nothing at all on its command line? Try it! But be prepared; you could end up with a mess.

Fortunately, test can help. Let's put this as the very first test in our program, right after the #!/bin/bash:

if [ -z "$1" ]
then
    echo $0: usage: $0 filename
    exit 3
fi

Now if the user puts nothing on the command line, we print a usage message and quit. The -z operator means “is this an empty string?”. Notice the double quotes around the $1: they are mandatory here. If you leave them out, test will give an error message in just the situation we are trying to detect. The quotes protect the nothing-at-all stored in $1 from causing a syntax error.

This if clause appears at the very top of many, many shell scripts. Among its other benefits, it relieves us from having to wrap $1 in quote marks later in our program, since if $1 were empty we would have exited at the start. In fact, the only time quotes would still be necessary would be if $1 could contain characters with a special meaning to the shell, such as a space or a question mark. Filenames don't, usually.

What if we want our script to be able to take a variable number of arguments? Most Unix commands can, after all. One way is clear: we could just cut and paste all the stuff in our shell script, so we'd have a bunch of commands that dealt with $1, then a bunch of commands that dealt with $2, and so forth. Sound like a good idea? No? Good for you; it's a terrible idea.

First of all, there would be some fixed upper limit on the number of arguments we could handle, determined by when we got tired of cutting, pasting, and editing our script. Second, any time you have many copies of the same code, you have a quality problem waiting to happen. You'll forget to make a change, or fix a bug, all of the many places necessary. Third, we often hand wildcards, like *, to Unix commands on their command lines. These wildcards are expanded into a list of filenames before the command runs! So it's very easy to get a command line with more arguments than some arbitrary, low limit.

Maybe we could use some kind of arithmetic trick to count through our arguments, like $i or something. This won't work either. The expression $i means “the contents of the variable called i”, not “the i'th thing on the command line.” Furthermore, not all shells let you refer to command-line words after $9 at all, and those that do make you use ${10}, ${11}, and so forth.

So what do we do? This:

while [ ! -z "$1" ]
do
    # do stuff to $1
    shift
done

Here's how we read that script: “While there's something in $1, we mess with it. Immediately after we finish messing with it, we do the shift command, which moves the contents of $2 into $1, the contents of $3 into $2, and so forth, regardless of how many of these command-line arguments there are. Then we go back and do it all again. We know we've finished when there's nothing at all in $1.”

This technique allows us to write a script that can handle any number of arguments, while only dealing with $1 at a time. So now our script looks like this:

#!/bin/bash
while [ ! -z "$1" ]
do
    # do stuff to $1
    if [ ! -r $1 ]
    then
        echo $0: I see no $1 file here.
        exit 1
    fi
        # omitted test...
    lpr $1
    Mail boss < $1
    # and so forth...
    shift
done
exit 0

Notice that we nested if inside while. We can do that all we like. Also notice that this program quits the instant it finds something wrong. If you would like it to continue on to the next argument instead of bombing out, just replace an exit with:

shift
continue

The continue command just means “Go back up to the top of the loop right now, and try the control command again.” Thought question: why did we have to put a shift right before the continue?

Here's a potential problem: we've made it easy for someone to use this program on files that live in different directories. But we're only testing the current directory for writability. Instead, we should do this:

if [ ! -w `dirname $1` ]
then
    echo $0: I will not be able to delete $1 for you.
    # ...

The dirname command prints out what directory a file is in, judging from its pathname. If you give dirname a filename that doesn't start with a directory, it will print “.”--the current directory. And those backquotes? Unlike all other kinds of quotation marks, they don't mean “this is really all one piece ignore spaces.” Instead, backquotes—also called “grave accents”--mean “Run the command inside the backquotes before you run the whole command line. Capture all of the backquoted command's output, and pretend that was what appeared on the larger command line instead of the junk in backquotes.” In other words, we are substituting a command's output into another command line.

So here is the final version of our shell script:

#!/bin/bash
while [ ! -z "$1" ]
do
    if [ ! -r $1 ]
    then
        echo $0: I see no $1 file here.
        shift
        continue
    fi
    if [ ! -w `dirname $1` ]
    then
        echo $0: I will not be able to delete $1 for you.
        shift
        continue
    fi
    lpr $1
    Mail boss < $1
    cp $1 /floppy/`basename $1`
    rm $1
    shift
done
exit 0

An exercise for the reader: what does `basename $1` do?

Now there are only two other techniques you need to know to meet the vast majority of your scripting needs. First, suppose you really do need to count. How do we do the equivalent of a C for loop? Here's the traditional Bourne shell way:

i=0
upperlim=10
while [ $i -lt $upperlim ]
do
    # mess with $i
    i=`expr $i + 1`
done

Notice that we did not use the for keyword. for is for something else entirely. Instead, here we initialize a variable i to 0, then we enter and remain in the loop as long as the value in i is less than 10. (Fortran programmers will recognize -lt as the less-than operator; guess why > is not used in this context.) The rather mysterious line

i=`expr $i + 1`

calls the expr command, which evaluates arithmetic expression. We stuff expr's output back into i using backquotes.

Ugly, isn't it? And not especially fast either, since we are running a command every time we want to add 1 to i. Can't the shell just do the arithmetic itself? If the shell is the Bourne shell, no, it can't. But the Korn shell can:

((i=i+1))

Use that syntax if it works, and if you don't need portability. The bash shell uses something similar:

i=$(($i+1))

which is a bit more portable (it even works in the Korn shell), since it is specified by POSIX, but still won't work for some non-POSIX bourne shells.

So what does for do? It allows you to wade through a list of items, assigning a variable to each element of the list in turn. Here's a trivial example:

for a in Larry Moe Curly
do
    echo $a
done

which would print

Larry
Moe
Curly

Less trivially, we can use this to handle the case where we want to do something for each word in a variable:

mylist="apple banana cheese rutabaga"
for w in $mylist
do
    # mess with $w
done

or for each file matched by a shell wildcard pattern:

for f in /docs/reports/*.txt
do
    pr -h $f $f | lpr
done

or for each word in the output of a command:

for a in `cat people.txt`
do
    banner $a
done

Here's how you can use for to simulate the C for you know and love:

for i in 0 1 2 3 4 5 6 7 8 9 10 11
do
    # mess with $i
done

Of course, it'd be very difficult to have a variable upper limit with this syntax, which is why we usually use the while loop shown above.

Congratulations! You've now seen what's at work in the vast bulk of practical shell scripts. Go forth and save time!

Brian Rice (rice@kcomputing.com) s Member of Technical Staff with K Computing, a nationwide Unix and Internet training firm.

Load Disqus comments