Polishing the wegrep Wrapper Script
When last I discussed
shell scripts, I was presenting a shell script that
offered an alternative to the -C context flag in GNU
grep
. Although most
modern Linux systems have the more capable grep
command, older systems
likely don't have this particular feature, and it's also a good
excuse to dig into working with wrapper scripts too.
"Wait. What's a wrapper script?" I can hear you ask, and some of you also are now trying to think of a famous rapper whose name you can reference for a punny response. I've already beat you there: "Can't touch that!"
A wrapper is a script that replaces a command on the Linux system but
secretly calls the command, just offering more and better capabilities and
features. When you have an alias set up so that every invocation of
ls
is really ls -F
, that's the same basic idea.
Linux and its grizzled father UNIX are really powerful because they offer these sorts of capabilities; it's hard to write a wrapper for Microsoft Excel on a Windows 10 system, by contrast.
A command with multiple versions in the wild is a perfect example of where
a wrapper can be so beneficial too. Imagine you're deploying a few
hundred servers and want to run a bare-bones Linux on them to maximize
available cycles. Problem is, your admin scripts rely on the very
latest-and-greatest versions of sed
,
grep
and find
. Solution? Point the scripts at
your wrapper versions of those commands, and make sure every flag you need
is implemented, either in the base command (as would be the case on the
newer systems) or through the wrapper code itself.
So, back to wegrep. When last I left this script, it offered up the base -C
functionality of giving one or more lines of context before and after
each match to a grep
search. Left on the to-do list were to make it smarter
about when to add the "- - - - - -" divider line, to add line
numbers and to highlight the actual match.
Let's start with making the script smarter with the divider line, because that's by far the easiest. Like any script that tries to separate multiple blocks of output neatly, the key is really to count how many times the output has been sent. Here's the solution:
if [ $matches -eq 0 ] ; then
echo "-----"
fi
matches=$(( $matches + 1 ))
This appears prior to each block of output. The very first time it produces the top divider line, and otherwise it's skipped. After the matching line or lines, however, there's another divider line that is included each and every time.
Adding line numbers can be accomplished a number of ways, but I'm going
to exploit an interesting capability of the sed
command itself, the
"=" expression. Let me demonstrate with the wonderland.txt data
file that contains the first couple paragraphs of Alice in
Wonderland:
$ head -5 wonderland.txt | sed =
1
------------------------------------------------------
2
3
ALICE'S ADVENTURES IN WONDERLAND
4
5
Lewis Carroll
You can see what it does, I hope? It adds line numbers, but by having the
number actually show up on a line prior to the actual matching line. It's a bit
funky, but a second sed
invocation fixes the problem and gives output
that makes a lot more sense:
$ head -5 wonderland.txt | sed = | sed 'N;s/\n/: /'
1: ------------------------------------------------
2:
3: ALICE'S ADVENTURES IN WONDERLAND
4:
5: Lewis Carroll
In the above, the replacement sequence is a colon followed by the Tab character itself, which can be entered by typing Ctrl-V followed by the Tab itself—easily done in scripts.
So, that's two down: a smarter divider line and the ability to number the output lines. Let's see how that works:
$ sh wegrep.sh '^Alice' wonderland.txt
-----
12:
13: ^Alice was beginning to get very tired of sitting by
14: her sister on the bank, and of having nothing to do:
-----
27: There was nothing so very remarkable in that; nor did
28: ^Alice think it so very much out of the way to hear the
29: Rabbit say to itself, 'Oh dear! Oh dear! I shall be
-----
The dividers work perfectly, showing up the minimum amount needed to denote each matching block of lines clearly, and the line numbers are neat and helpful.
The trickier part is still left to tackle. How do you actually highlight the match in each section?
ANSI Color Sequences
You may not realize it, but odds are incredibly high that your Terminal or xterm window, whether you're directly in a Linux system or connecting via a Windows or Mac computer, is emulating what's known as an ANSI terminal.
ANSI is the American National Standards Institute, but don't be misled; this is a global standard, particularly when it comes to colors, bold and other visual aspects to the terminal.
The problem is, the sequences to turn on and turn off bold or specific colors has to be fairly obscure to ensure that users don't accidentally end up invoking it. So "color:" would be a fail, as would "<color>". Instead, it's done through an escape sequence: Escape + [ + 3 + 2 + m causes all subsequent text to be rendered as green, for example.
The Escape + [ sequence prefix has a name of its own. It's a Control Sequence Introducer, although you probably don't need to know that! You can find a full table of ANSI color sequences on-line.
Once you're done with the highlighted text, you'll need to change the display back to regular text, and that's done with the sequence Escape + [ + 0 + m.
Add them all up, and here's what you use to highlight whatever value is stored as $1 in a string:
\033[32m$1\033[0m
The \033
is a shorthand for Escape. Rather than make this an echo
statement, it's a good use of printf
, so here's the sequence:
sed ''/$1/s//`printf "\033[32m$1\033[0m"`/'' "$2"
This basically replaces every occurrence of $1 with itself, prefixed with the ANSI green sequence and suffixed with the sequence to return subsequent text to its normal display characteristics.
I'm being a bit lazy here by exploiting how the script works too. If it can show matching lines from a file, it also can show matching lines that have had the ANSI sequences slipped in. So here's the new flow, and it's a bit more complicated than my original stab at this script:
sed ''/$1/s//`printf "\033[32m$1\033[0m"`/'' "$2" | \
sed = | sed 'N;s/\n/: /' | \
sed -n "${before},${after}p"
Four invocations of sed in a row—ah, I love Linux!
In the above, the first sed invocation adds the ANSI sequences, the second
and third work together to add the line number prefixes, and the fourth
shows the lines in the stream from the range $before
to $after
.
To see how those are calculated, here's the full script:
#!/bin/sh
# wegrep - grep with context and regular expressions
grep=/usr/bin/grep
sed=/usr/bin/sed
context=1
matches=0
if [ $# -ne 2 ] ; then
echo "Usage: wegrep [pattern] filename" ; exit 1
fi
for match in $($grep -n -E "$1" "$2" | cut -d: -f1)
do
before=$(( $match - $context ))
after=$(( $match + $context ))
if [ $matches -eq 0 ] ; then
echo "-----"
fi
sed ''/$1/s//`printf "\033[32m$1\033[0m"`/'' "$2" | \
sed = | sed 'N;s/\n/: /' | \
sed -n "${before},${after}p"
echo "-----"
matches=$(( $matches + 1 ))
done
exit 0
It's surprisingly short given how useful this wrapper script is and how many new
features have been added to an older, crude grep
program.
And, here it is in use:
$ sh wegrep.sh 'Alice' wonderland.txt
-----
12:
13: Alice was beginning to get very tired of sitting by her
14: sister on the bank, and of having nothing to do: once
-----
16: reading, but it had no pictures or conversations in it,
17: 'and what is the use of a book,' thought Alice 'without
18: pictures or conversation?'
-----
27: There was nothing so very remarkable in that; nor did
28: Alice think it so very much out of the way to hear the
29: Rabbit say to itself, 'Oh dear! Oh dear! I shall be
-----
There's still a hiccup in the script, however. Because of the ANSI sequence sed invocation, the proper functionality of regular expressions is lost (try it, you'll see what I mean). Is it a huge problem? Maybe not, but I'm going to leave solving it as an exercise for you, the reader.
As always, if you have suggestions, let me know via e-mail: dave@linuxjournal.com.