Bash Shell Script: Building a Better March Madness Bracket
Last year, I wrote an article for Linux Journal titled "Building Your March Madness Bracket" My article was timely, arriving just in time for the "March Madness" college basketball series. You see, I don't follow college basketball (or really, any sports at all), but I do like to participate in office pools. And every year, it seems my office likes to fill out the March Madness brackets to see who can best predict the outcomes.
Since I don't follow college basketball, I am not a good judge of which teams might perform better than others. But fortunately, the NCAA ranks the teams for you, so I wrote a Bash script that filled out my March Madness bracket for me. Since teams were ranked 1–16, I used a "D16" method borrowed from tabletop gaming. I thought this was an elegant method to predict the outcomes.
But, there's a bug in my script. Specifically, there's an error in a key assumption for the D16 algorithm, so I'd like to correct that with an improved March Madness script here.
Let's Review What Went Wrong
My Bash script predicted the outcome of a match by comparing the ranking of each team. So, you can throw a D16 "die" to determine if team A wins and another D16 "die" to determine if team B loses, or vice versa. If the two throws agree, you know the outcome of the game: team A wins and team B loses, or team A loses and team B wins.
I asserted that a #1 team should be a strong team, so I assumed the #1 team had 15 out of 16 "chances" to win, and one out of 16 "chances" to lose. Without any other inputs, the #1 ranked team would win if its D16 throw is two or greater, and the #1 team could lose only if the D16 value was one. With that assumption, I wrote this function:
function guesswinner {
rankA=$1
rankB=$2
d16A=$(( ( $RANDOM % 16 ) + 1 ))
d16B=$(( ( $RANDOM % 16 ) + 1 ))
if [ $d16A -gt $rankA -a $d16B -le $rankB ] ; then
# team A wins and team B loses
return $rankA
elif [ $d16A -le $rankA -a $d16B -gt $rankB ] ; then
# team A loses and team B wins
return $rankB
else
# no winner
return 0
fi
}
In the guesswinner function, each D16 roll generates a random number 1–16. If the rank of team A is "rankA" and the rank of team B is "rankB," and the D16 roll for team A is "A" and the roll for team B is "B," the function tests two D16 rolls like this:
If A greater than rankA (team A wins) and B less than or equal to rankB (team B loses), then team A wins.
If A less than or equal to rankA (team A loses) and B greater rankB (team B wins), then team B wins.
But look at what happens if team A is ranked #1 and team B is ranked #16. Team A will always win:
A roll 1–16 will have a 15 out of 16 chance to be greater than 1 (team A wins), and a 1–16 roll will always be less than or equal to 16 (team B loses).
A roll 1–16 will have a 1 out of 16 chance to be less than or equal to 1 (team A loses) but a 1–16 roll will never be greater than 16 (team B wins).
There's no scenario in which a rank #16 team B can win over a rank #1 team A. It's a forgone conclusion that in any match of a rank 1 team versus a rank 16 team, the rank 1 team will always win. That's not right. There should be a slim chance for the rank 16 team to win over the rank 1 team.
A Better Algorithm
Instead of a "static" D16 die, we need a custom "die" that has faces relative to the chance of each team to win. Let's consider this simple algorithm to generate a custom die:
Team A gets a=16-rankA+1 sides.
Team B gets b=16-rankB+1 sides.
Under this assumption, a rank 1 team versus a rank 16 team would generate a die with a=16-1+1=16 "team A" sides and b=16-16+1=1 "team B" sides, resulting in a 17-sided die. Similarly, a more even match, such as a rank 8 team versus a rank 9 team, would create a die with a=16-8+1=9 "team A" sides and b=16-9+1=8 "team B" sides, resulting in another 17-sided die.
It's not always a 17-sided die, however. A rank 1 team against a rank 9 team would generate a die with a=16-1+1=16 "team A" sides and b=16-9+1=8 "team B" sides, or a 24-sided die.
In Bash, you can simulate a virtual custom "die" through a file. It's simple enough to generate a file with the correct number of "team A" sides and "team B" sides. If you already have calculated a and b as above, you can write a file like this:
( for teamA in $(seq 0 $a) ; do echo $1 ; done
for teamB in $(seq 0 $b) ; do echo $2 ; done ) > die.file
Picking a random value from this file is as easy as randomizing or "shuffling"
the file, then selecting the first line. On Linux systems, you can use the
shuf(1) program from GNU coreutils to generate a random permutation of lines
from a file. This randomizes whatever data you feed into shuf
. Once shuffled,
you easily can select the first line of the randomized output using
head
:
( for teamA in $(seq 0 $a) ; do echo $1 ; done
for teamB in $(seq 0 $b) ; do echo $2 ; done ) | shuf | head -1
That simple expression becomes the heart of the improved March Madness script. It operates the way I want it to: a rank 1 team almost always (but not always) will win over a team 16 team, yet more closely matched games, such as a rank 8 team versus a rank 9 team or a rank 2 team against a rank 3 team, will present more even odds.
Building a Better March Madness Script
The above can be wrapped into a new guesswinner
function to predict a contest
between two teams, whose ranks are passed as arguments. The function generates
the virtual "die" and uses that to guess a winner:
function guesswinner {
# $1 = team A rank
# $2 = team B rank
a=$(( 16 - $1 + 1 ))
b=$(( 16 - $2 + 1 ))
win=$( ( for teamA in $(seq 1 $a) ; do echo $1 ; done
for teamB in $(seq 1 $b) ; do echo $2 ; done ) | shuf | head -1 )
echo "$1 vs $2 : $win"
return $win
}
Since the March Madness brackets are always played in order, you can write a
playbracket
function to run through the different iterations of the bracket.
Winners from round one are carried into rounds two and three to select an
ultimate winner for the bracket in round four:
function playbracket {
# $1 = name of bracket
echo -e "\n___ $1 ___"
echo -e '\nround 1\n'
guesswinner 1 16
round1A=$?
guesswinner 8 9
round1B=$?
guesswinner 5 12
round1C=$?
guesswinner 4 13
round1D=$?
guesswinner 6 11
round1E=$?
guesswinner 3 14
round1F=$?
guesswinner 7 10
round1G=$?
guesswinner 2 15
round1H=$?
echo -e '\nround 2\n'
guesswinner $round1A $round1B
round2A=$?
guesswinner $round1C $round1D
round2B=$?
guesswinner $round1E $round1F
round2C=$?
guesswinner $round1G $round1H
round2D=$?
echo -e '\nround 3\n'
guesswinner $round2A $round2B
round3A=$?
guesswinner $round2C $round2D
round3B=$?
echo -e '\nround 4\n'
guesswinner $round3A $round3B
return $?
}
Finally, you need only call the playbracket
function for each of the four
regions. You are left with the "Final Four" with the winners of each bracket,
but I'll leave the final determination of those contests for you to resolve on
your own:
#!/bin/bash
# improved basketball March Madness prediction
function guesswinner {
...
}
function playbracket {
...
}
playbracket 'Midwest'
playbracket 'East'
playbracket 'West'
playbracket 'South'
Every time you run the script, you will generate a fresh NCAA March Madness basketball bracket. It's entirely random, so each iteration of the bracket will be different. Here's one sample run:
$ ./basketball2.sh
___ Midwest ___
round 1
1 vs 16 : 1
8 vs 9 : 9
5 vs 12 : 12
4 vs 13 : 4
6 vs 11 : 11
3 vs 14 : 3
7 vs 10 : 7
2 vs 15 : 2
round 2
1 vs 9 : 1
12 vs 4 : 4
11 vs 3 : 3
7 vs 2 : 7
round 3
1 vs 4 : 1
3 vs 7 : 7
round 4
1 vs 7 : 1
___ East ___
round 1
1 vs 16 : 16
8 vs 9 : 9
5 vs 12 : 5
4 vs 13 : 13
6 vs 11 : 6
3 vs 14 : 3
7 vs 10 : 10
2 vs 15 : 2
round 2
16 vs 9 : 9
5 vs 13 : 5
6 vs 3 : 3
10 vs 2 : 2
round 3
9 vs 5 : 5
3 vs 2 : 2
round 4
5 vs 2 : 2
___ West ___
round 1
1 vs 16 : 1
8 vs 9 : 8
5 vs 12 : 5
4 vs 13 : 4
6 vs 11 : 6
3 vs 14 : 3
7 vs 10 : 10
2 vs 15 : 15
round 2
1 vs 8 : 8
5 vs 4 : 5
6 vs 3 : 6
10 vs 15 : 10
round 3
8 vs 5 : 8
6 vs 10 : 10
round 4
8 vs 10 : 8
___ South ___
round 1
1 vs 16 : 1
8 vs 9 : 8
5 vs 12 : 5
4 vs 13 : 4
6 vs 11 : 6
3 vs 14 : 3
7 vs 10 : 7
2 vs 15 : 2
round 2
1 vs 8 : 1
5 vs 4 : 4
6 vs 3 : 6
7 vs 2 : 7
round 3
1 vs 4 : 4
6 vs 7 : 6
round 4
4 vs 6 : 4
In this sample run, my script selects team 1 in the Midwest, team 2 in the East, team 8 in the West, and team 4 in the South. More important, note that the rank 16 team won the first round against the rank 1 team in the East bracket. This could not happen in the script I posted last year. My bug is fixed!
The point of using a script to build your NCAA March Madness basket bracket isn't to take away the fun of the game. On the contrary, since I don't have much familiarity with basketball, building my bracket programmatically allows me to participate in the office basketball pool. It's entertaining without requiring much familiarity with sports statistics. My script gives me a reason to follow the games, but without the emotional investment if my bracket doesn't perform well—and that's good enough for me.