What's GNU: Bash - The GNU Shell

by Chet Ramey

Bash is the shell, or command language interpreter, that will appear in the GNU operating system. The name is an acronym for the “Bourne-Again SHell”, a pun on Steve Bourne, the author of the direct ancestor of the current Unix shell /bin/sh, which appeared in the Seventh Edition Bell Labs Research version of Unix.

Bash is an sh-compatible shell that incorporates useful features from the Korn shell (ksh) and the C shell (csh), described later in this article. It is ultimately intended to be a faithful implementation of the IEEE POSIX Shell and Tools specification (IEEE Working Group 1003.2). It offers functional improvements over sh for both interactive and programming use.

While the GNU operating system will most likely include a version of the Berkeley shell csh, Bash will be the default shell. Like other GNU software, Bash is quite portable. It currently runs on nearly every version of Unix and a few other operating systems-an independently-supported port exists for OS/2, and there are rumors of ports to DOS and Windows NT. Ports to Unix- like systems such as QNX and Minix are part of the distribution.

What's POSIX, anyway?

POSIX is a name originally coined by Richard Stallman for a family of open system standards based on Unix. There are a number of aspects of Unix under consideration for standardization, from the basic system services at the system call and C library level to applications and tools to system administration and management. Each area of standardization is assigned to a working group in the 1003 series.

The POSIX Shell and Tools standard has been developed by IEEE Working Group 1003.2 (POSIX.2). It concentrates on the command interpreter interface and utility programs commonly executed from the command line or by other programs. An initial version of the standard has been approved and published by the IEEE, and work is currently underway to update it. There are four primary areas of work in the 1003.2 standard:

  • Aspects of the shell's syntax and command language. A number of special builtins such as cd and exec are being specified as part of the shell, since their functionality usually cannot be implemented by a separate executable;

  • A set of utilities to be called by shell scripts and applications. Examples are programs like sed, tr and awk. Utilities commonly implemented as shell builtins are described in this section, such as test and kill. An expansion of this section's scope, termed the User Portability Extension, or UPE, has standardized interactive programs such as vi and mailx;

  • A group of functional interfaces to services provided by the shell, such as the traditional system() C library function. There are functions to perform shell word expansions, perform filename expansion (globbing), obtain values of POSIX.2 system configuration variables, retrieve values of environment variables (getenv()), and other services;

  • A suite of “development” utilities such as c89 (the POSIX.2 version of cc), and yacc.

Bash is concerned with the aspects of the shell's behavior defined by POSIX.2. The shell command language has of course been standardized, including the basic flow control and program execution constructs, I/O redirection and pipelining, argument handling, variable expansion, and quoting. The special builtins, which must be implemented as part of the shell to provide the desired functionality, are specified as being part of the shell; examples of these are eval and export. Other utilities appear in the sections of POSIX.2 not devoted to the shell which are commonly (and in some cases must be) implemented as builtin commands, such as read and test. POSIX.2 also specifies aspects of the shell's interactive behavior as part of the UPE, including job control and command line editing. Interestingly enough, only vi-style line editing commands have been standardized; emacs editing commands were left out due to objections.

While POSIX.2 includes much of what the shell has traditionally provided, some important things have been omitted as being “beyond its scope”. There is, for instance, no mention of a difference between a login shell and any other interactive shell (since POSIX.2 does not specify a login program). No fixed startup files are defined, either-the standard does not mention .profile.

Basic Bash features

Since the Bourne shell provides Bash with most of its philosophical underpinnings, Bash inherits most of its features and functionality from sh. Bash implements all of the traditional sh flow control constructs (for, if, while, etc.). All of the Bourne shell builtins, including those not specified in the POSIX.2 standard, appear in Bash. Shell functions, introduced in the SVR2 version of the Bourne shell, are similar to shell scripts, but are defined using a special syntax and are executed in the same process as the calling shell. Bash has shell functions which behave in a fashion upward-compatible with sh functions. There are certain shell variables that Bash interprets in the same way as sh, such as PS1, IFS and PATH. Bash implements essentially the same grammar, parameter and variable expansion semantics, redirection, and quoting as the Bourne shell. Where differences appear between the POSIX.2 standard and traditional sh behavior, Bash follows POSIX.

The Korn Shell (ksh) is a descendent of the Bourne shell written at AT&T Bell Laboratories by David Korn. It provides a number of useful features that POSIX and Bash have adopted. Many of the interactive facilities in POSIX.2 have their roots in the ksh. For example, the POSIX and ksh job control facilities are nearly identical. Bash includes features from the Korn Shell for both interactive use and shell programming.

For programming, Bash provides variables such as RANDOM and REPLY, the typeset builtin, the ability to remove substrings from variables based on patterns, and shell arithmetic.

  • RANDOM expands to a random number each time it is referenced. Assigning a value to RANDOM seeds the random number generator.

  • REPLY is the default variable used by the read builtin when no variable names are supplied as arguments.

  • The typeset builtin is used to define variables and give them attributes such as readonly.

Bash arithmetic allows the evaluation of an expression and the substitution of the result. Shell variables may be used as operands, and the result of an expression may be assigned to a variable. Nearly all of the operators from the C language are available, with the same precedence rules:

   $ echo $((3 + 5 * 32)) 163

For interactive use, Bash implements ksh-style aliases and builtins such as fc (discussed below) and jobs. Bash aliases allow a string to be substituted for a command name. They can be used to create a mnemonic for a Unix command name (e.g., alias del=rm), to expand a single word to a complex command (e.g., alias news='xterm -g 80x45 -title trn -e trn -e -S1 -N &), or to ensure that a command is invoked with a basic set of options (e.g., alias ls="/bin/ls -F").

The C shell (csh) was originally written by Bill Joy while at the University of California at Berkeley. It is widely used and quite popular for its interactive facilities. Bash includes a csh-compatible history expansion mechanism (“! history”), brace expansion, access to a stack of directories via the pushd, popd and dirs builtins, and tilde expansion, to generate users' home directories. Tilde expansion has also been adopted by both the Korn Shell and POSIX.2.

There were certain areas in which POSIX.2 felt standardization was necessary, but no existing implementation provided the proper behavior. The working group invented and standardized functionality in these areas, which Bash implements. The command builtin was invented so that shell functions could be written to replace builtins; it makes the capabilities of the builtin available to the function. The reserved word “!” was added to negate the return value of a command or pipeline; it was nearly impossible to express “if not x” cleanly using the sh language.

There exist multiple incompatible implementations of the test builtin, which tests files for type and other attributes and performs arithmetic and string comparisons. POSIX considered none of these correct, so the standard behavior was specified in terms of the number of arguments to the command. POSIX.2 dictates exactly what will happen when four or fewer arguments are given to test, and leaves

the behavior undefined when more arguments are supplied. Bash uses the POSIX.2 algorithm, which was conceived by David Korn.

Features not in the Bourne Shell

There are a number of minor differences between Bash and the version of sh present on most other versions of Unix. The majority of these are due to the POSIX standard, but some are the result of Bash adopting features from other shells. For instance, Bash includes the new “!” reserved word, the command builtin, the ability of the read builtin to correctly return a line ending with a backslash, symbolic arguments to the umask builtin, variable substring removal, a way to get the length of a variable, and the new algorithm for the test builtin from the POSIX.2 standard, none of which appear in sh.

Bash also implements the “$(...)” command substitution syntax, which replaces the sh `...` construct. The “$(...)” construct expands to the output of the command contained within the parentheses, with trailing newlines removed. The sh syntax is accepted for backwards compatibility, but the “$(...)” form is preferred because its quoting rules are much simpler and it is easier to nest.

The Bourne shell does not have such features as brace expansion, the ability to have a variable and a function with the same name, local variables in shell functions, the ability to enable and disable individual builtins or write a function to replace a builtin, or a means to export a shell function to a child process.

Bash has closed a long-standing shell security hole by not using the $IFS variable to split each word read by the shell, but splitting only the results of expansion (ksh and the 4.4 BSD sh have fixed this as well). Useful behavior such as a means to abort execution of a script read with the “.” command or automatically exporting variables in the shell's environment to children is also not present in the Bourne shell. Bash provides a much more powerful environment for both interactive use and programming.

Bash-specific Features

This section details a few of the features which make Bash unique. Most of them provide improved interactive use, but a few programming improvements are present as well. Full descriptions of these features can be found in the Bash documentation.

Startup Files

Bash executes startup files differently than other shells. The Bash behavior is a compromise between the csh principle of startup files with fixed names executed for each shell and the sh “minimalist” behavior. An interactive instance of Bash started as a login shell reads and executes ~/.bash_profile (the file .bash_profile in the user's home directory), if it exists. An interactive non-login shell reads and executes ~/.bashrc. A non-interactive shell (one begun to execute a shell script, for example) reads no fixed startup file, but uses the value of the variable $ENV, if set, as the name of a startup file. The ksh practice of reading $ENV for every shell, with the accompanying difficulty of defining the proper variables and functions for interactive and non-interactive shells or having the file read only for interactive shells, was considered too complex. Ease of use won out here.

New Builtin Commands

There are a few builtins which are new or have been extended in Bash.

  • The enable builtin allows builtin commands to be turned on and off arbitrarily. To use the version of echo found in a user's search path rather than the Bash builtin, “enable -n echo” suffices.

  • The help builtin provides quick synopses of the shell facilities without requiring access to a manual page.

  • Builtin is similar to command in that it bypasses shell functions and directly executes builtin commands. Access to a csh-style stack of directories is provided via the pushd, popd and dirs builtins.

  • Pushd and popd insert and remove directories from the stack, respectively, and dirs lists the stack contents.

  • On systems that allow fine-grained control of resources, the ulimit builtin can be used to tune these settings. Ulimit allows a user to control, among other things, whether core dumps are to be generated, how much memory the shell or a child process is allowed to allocate, and how large a file created by a child process can grow.

  • The suspend command will stop the shell process when job control is active; most other shells do not allow themselves to be stopped like that.

  • Type, the Bash answer to which and whence, shows what will happen when a word is typed as a command:

$ type export export is a shell builtin $ type -t export builtin $
type bash bash is /bin/bash $ type cd cd is a function cd () {
    builtin cd ${1+"$@"} && xtitle $HOST: $PWD
}

Various modes tell what a command word is (reserved word, alias, function, builtin, or file) or which version of a command will be executed based on a user's search path. Some of this functionality has been adopted by POSIX.2 and folded into the command utility.

Editing and Completion

One area in which Bash shines is command line editing. Bash uses the readline library to read and edit lines when interactive. Readline is a powerful and flexible input facility that a user can configure to their own tastes. It allows lines to be edited using either emacs or vi commands, where those commands are appropriate. The full capability of emacs is not present-there is no way to execute a named command with M-x, for instance-but the existing commands are more than adequate. The vi mode is compliant with the command line editing standardized by POSIX.2.

Readline is fully customizable. In addition to the basic commands and key bindings, the library allows users to define additional key bindings using a startup file. The inputrc file, which defaults to the file ~/.inputrc , is read each time readline initializes, permitting users to maintain a consistent interface across a set of programs. Readline includes an extensible interface, so each program using the library can add its own bindable commands and program-specific key bindings. Bash uses this facility to add bindings that perform history expansion or shell word expansions on the current input line.

Readline interprets a number of variables which further tune its behavior. Variables exist to control whether or not eight-bit characters are directly read as input or converted to meta- prefixed key sequences (a meta-prefixed key sequence consists of the character with the eighth bit zeroed, preceded by the meta-prefix character, usually escape, which selects an alternate keymap), to decide whether to output characters with the eighth bit set directly or as a meta-prefixed key sequence, whether or not to wrap to a new screen line when a line being edited is longer than the screen width, the keymap to which subsequent key bindings should apply, or even what happens when readline wants to ring the terminal's bell. All of these variables can be set in the inputrc file.

The startup file understands a set of C preprocessor-like conditional constructs which allow variables or key bindings to be assigned based on the application using readline, the terminal currently being used, or the editing mode. Users can add program-specific bindings to make their lives easier: I have bindings that let me edit the value of $PATH and double-quote the current or previous word:

# Macros that are convenient for shell interaction $if Bash # edit the
path "\C-xp": "PATH=${PATH}\\C-e\C-a\f\C-f" # prepare to type a quoted
word-insert open and close double quotes # and move to just after the
open quote "\C-x\"": "\"\"\C-b" # Quote the current or previous word
"\C-xq": "\b\"\f\"" $endif

There is a readline command to re-read the file, so users can edit the file, change some bindings, and begin to use them almost immediately.

Bash implements the bind builtin for more dyamic control of readline than the startup file permits. Bind is used in several ways. In list mode, it can display the current key bindings, list all the readline editing directives available for binding, list which keys invoke a given directive, or output the current set of key bindings in a format that can be incorporated directly into an inputrc file. In batch mode, it reads a series of key bindings directly from a file and passes them to readline. In its most common usage, bind takes a single string and passes it directly to readline, which interprets the line as if it had just been read from the inputrc file. Both key bindings and variable assignments can appear in the string given to bind.

The readline library also provides an interface for word completion. When the completion character (usually TAB) is typed, readline looks at the word currently being entered and computes the set of filenames of which the current word is a valid prefix. If there is only one possible completion, the rest of the characters are inserted directly, otherwise the common prefix of the set of filenames is added to the current word. A second TAB character entered immediately after a non-unique completion causes readline to list the possible completions; there is an option to have the list displayed immediately. Readline provides hooks so that applications can provide specific types of completion before the default filename completion is attempted. This is quite flexible, though it is not completely user-programmable. Bash, for example, can complete filenames, command names (including aliases, builtins, shell reserved words, shell functions, and executables found in the file system), shell variables, usernames, and hostnames. It uses a set of heuristics that, while not perfect, is generally quite good at determining what type of completion to attempt.

This article will be continued next month.

Load Disqus comments