Roman Numerals and Bash
Fun with retro-coding a Roman numeral converter—I head back to my college years and solve me homework anew!
I earned a bachelor's degree in computer science back in the dawn of computing. Well, maybe it wasn't quite that long ago, but we did talk about Ada and FORTRAN in class. As a UCSD alumnus, however, it's no surprise that UCSD Pascal was the programming language of choice. Don't worry; no punch cards and no paper tape were involved in my educational endeavors.
As with modern computer science study, we spent a lot of time coding algorithms and solving problems and puzzles. I'm a board-gamer, so I was quite happy to try to solve the "dining philosophers problem", the "four color problem" or the "traveling salesman problem". You might well have tried to solve the same darn problems.
One coding problem that has stuck with me is a Roman numeral conversion program. As part of my first programming class, I recall it being a pretty tricky problem, but we didn't have the internet or GitHub to scrounge around for smart solutions or inspiration.
So, in the spirit of retro-coding, let's build a script that can convert Roman numerals into regular decimal equivalent values.
Roman Numerals
I know, you're saying "um, remind me, what are Roman numerals?", even though you see them all the time in movies and books. You just ignore the MCMLXIII that shows up as a copyright notice. What's funny is that the general industry consensus is that studios use those Roman numerals instead of the more understandable "Copyright 1963" to obfuscate the age of the film (by the way, MCMLXIII = 1963).
It turns out that Roman numerals are interesting because they are essentially grouped into logical segments. At its most basic, each letter has a specific decimal value, so let's start there (see Table 1 for the values).
Symbol | I | V | X | L | C | D | M |
Value | 1 | 5 | 10 | 50 | 100 | 500 | 1000 |
If you wanted to write the value 60 as Roman numerals, that's easy: LX. Reverse the two values, however, and it's a completely different value: XL = 40. Why? Because when a lower value symbol appears before a higher value symbol, it's considered a reduction of that value. The fancy name for this is subtractive notation.
In other words, LX = 50 + 10, but XL = L – X = 50 – 10.
Now you can see how the earlier value breaks into clusters of values based on whether a subsequent value is higher or lower than the current value. Here's the logic:
MCMLXIII = M + CM + L + X + III = 1000 + 900 + 50 + 10 + 3
The general rule involves a single character look-ahead. That is, the value 4 is represented as IV (literally 5 – 1), so while 8 is VIII, the value 9 is IX. Fortunately, 8 will never be IIX because that violates the single value subtraction, and the values always are adjacent so you won't see IM or VC.
Got it? It's pretty easy, really, once you know the letter values and the subtractive notation.
Writing a Roman Numeral Converter
Let's get to some coding.
The first challenge is to figure out how to convert individual Roman numeral values to their decimal equivalent. We humans can just glance at the table presented earlier and remember that L = 50, but we've got to teach the shell that trick too.
This would be a perfect use for a two-dimensional array where the primary index would be a character value, and the equivalent secondary value would be the corresponding decimal value. Alas, one of the weaknesses of Bash is that it doesn't support multi-dimensional arrays.
I'm going to be lazy and just utilize a function with a case statement. This also lets me support both uppercase and lowercase values:
mapit() {
case $1 in
I|i) equiv=1 ;;
V|v) equiv=5 ;;
X|x) equiv=10 ;;
L|l) equiv=50 ;;
C|c) equiv=100 ;;
D|d) equiv=500 ;;
M|m) equiv=1000 ;;
* ) echo "Error: Value $1 unknown" >&2 ; exit 2 ;;
esac
}
Recall that in Bash you can't have a function return a value, but you can
send parameters to the function. They are then accessible within the function
as positional parameters $1, $2, $3 and so on. Ergo $1
is the
letter being mapped to a decimal value. The return value is the global
variable $equiv
. It's a bit clumsy versus a more elegant language,
but...so it goes.
Let's put that snippet of code aside for a moment and look at the common
question of how to parse a string one character at a time. There are a ton of
different ways to figure out exactly how that should be done, but I'm
going to tap the nifty Bash function seq
.
The seq
command generates a sequence of values from the starting to ending
value—most easily at the command line:
$ seq 3 7
3
4
5
6
7
This is where the Bash string functions are going to be really helpful!
Let's utilize two of them. First, ${#string}
returns the number
of characters in the string. Then the more complex reference of
${string:start:num}
returns the substring starting at
start
for num
characters from the variable string
.
Put these three things together, and you have an elegant way to step through a
string, character by character. If the user-specified value is known as
romanvalue
(so as to be maximally mnemonic, of course), this
simple for
loop
breaks down the string:
for index in $(seq 1 ${#romanvalue}) ; do
echo "Letter $index is ${romanvalue:index-1:1}"
done
Now you can integrate that with the function mapit
presented earlier.
Combined, you get this script:
for index in $(seq 1 ${#romanvalue}) ; do
mapit ${romanvalue:index-1:1}
echo "${romanvalue:index-1:1} = $equiv"
done
The result? Let's decompose the earlier sequence:
$ sh roman.sh MCMLXIII
converting MCMLXIII
M = 1000
C = 100
M = 1000
L = 50
X = 10
I = 1
I = 1
It's pretty easy to sum it up:
sum=$(( $sum + $equiv ))
Without compensating for the subtractive notation, you'll find that—incorrectly!—MCMLXII sums up to 2163. It makes sense: the CM ended up as 1100 instead of 900, so it's exactly 200 off the correct value.
Obviously, there's a whole 'nother section of smarts needed in this program—an algorithm that basically says, "If the next value is greater than the current, subtract the current value from the next. If not, add the current to the next value and add that to the sum total."
The problem is that this means there's a fairly substantial change in the code, because you can't just say "look up value, add it to sum", but you need to implement a look-ahead concept.
And, that's exactly what I plan to cover in my next article, when I finish this script and look at the reverse—one that lets you get the Roman numeral equivalent of a decimal value. The latter is particularly interesting, because there's no Roman numeral greater than 1000, so given the notational conventions, the max value is going to be 3*1000+999.
Or is it? You tell me.