Calculating Day of the Week

calendar

For those of you playing along at home, you'll recall that our intrepid hero is working on a shell script that can tell you the most recent year that a specific date occurred on a specified day of the week—for example, the most recent year when Christmas occurred on a Thursday.

There are, as usual, nuances and edge cases that make this calculation a bit tricky, including the need to recognize when the specified date has already passed during the current year, because if it's July and we're searching for the most recent May 1st that was on a Sunday, we'd miss 2011 if we just started in the previous year.

In fact, as any software developer knows, the core logic of your program is often quite easy to assemble. It's all those darn corner cases, those odd, improbable situations where the program needs to recognize and respond properly that makes programming a detail-oriented challenge. It can be fun, but then again, it can be exhausting and take weeks of debugging to ensure excellent coverage.

That's where we are with this script too. On months where the first day of the month is a Sunday, we're already set. Give me a numeric date, and I can tell you very quickly what day of the week it is. Unfortunately, that's only 1/7th of the possible month configurations.

What DOW Is That DOM?

For purposes of this discussion, let's introduce two acronyms: DOM is Day Of Month, and DOW is Day Of Week. May 3, 2011, has DOM=3 and DOW=3, as it's a Tuesday.

The cal utility shows this month like this:


      May 2011
Su Mo Tu We Th Fr Sa
1  2  3  4  5  6  7
8  9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31

Look! A perfectly formed month, so it's easy to figure out the DOW for once. But, that's not really fair for our testing, so let's move forward a month to June and look at June 3 instead. That's DOM=3, DOW=6 (Friday):


     June 2011
Su Mo Tu We Th Fr Sa
          1  2  3  4
5  6  7  8  9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30

The solution I'm going to work with is likely more complicated than necessary, but it's mine and I'm sticking with it.

Here's the idea. As awk goes through the lines, it easily can ascertain NF (number of fields). If NF < 7, we have a month where the first day starts on a DOW other than Sunday. Any matching date for the first week of June 2011, for example, would have NF = 4.

Look back at June though, because it's important to recognize that the last week of the month has a problem too. It has NF=5. Because any match in that line must have DOM > 7, however, we can address this nuance later. Stay tuned, as they say.

The formula we can use to calculate day of week for the first week of a month, however, given all this information and with i the day of the month is DOW=i+(7-NF). A few test cases verify that it works:


June 3 = i=3, NF=4     DOW=(7-4)+3 = 6
July 1 = i=1, NF=2     DOW=(7-2)+1 = 6
May 2 = i=2, NF=7      DOW=(7-7+2 = 2

For any date that doesn't occur on that first week, however, we can ignore all these complicated calculations and simply get the day of the week.

How do you tell if it's in the first week? Another test. Search for the matching DOM and then look at the matching line number. If it's not line 1, we have to calculate the day of week from the matching cal output line:


awk "/$expr/ { for (i=1;i<=NF;i++)
   { if (\$i~/${day}/) { print i }}}"

In my previous columns, I was creating this overly complicated regular expression to match all the edge cases (literally, the cases when the match was the first or last day of a week). Instead, here's a new plan that's faster and less complicated. We'll use sed to pad each calendar with leading and trailing spaces:


cal june 2011 | sed 's/^/ /;s/$/ /'

Now our regular expression to match just the specified date and no others is easy:


[^0-9]DATEINQUESTION[^0-9]

Further, awk easily can give us that NF value too, so here's a rough skeleton of the DOW function for a given day of the month, month and year:


figureDOM()
{
  day=$1;  caldate="$2 $3"
  expr="[^0-9]${day}[^0-9]"
  NFval=$(cal $caldate | sed 's/^/ /;s/$/ /' | \
     awk "/$expr/ { print NF }")
  DOW="$(( $day + ( 7 - $NFval ) ))"
}

That works if we search only for matches that are in the first week of the month, but that, of course, is unrealistic, so here's a better, more robust script:


figureDOW()
{
  day=$1;  caldate="$2 $3"
  expr="[^0-9]${day}[^0-9]"
  cal $caldate | sed 's/^/ /;s/$/ /' > $temp
  NRval=$(cat $temp | awk "/$expr/ { print NR }")
  NFval=$(cat $temp | awk "/$expr/ { print NF }")
  if [ $NRval -eq 3 ] ; then
    DOW="$(( $day + ( 7 - $NFval ) ))"
  else
    DOW=$(cat $temp | awk "/$expr/
    { for (i=1;i<=NF;i++) { if (\$i~/${day}/) { print i }}}")
  fi
  /bin/rm -f $temp
}

A few quick tests:


DOW of 3 june 2011 = 6
DOW of 1 july 2011 = 6
DOW of 2 may 2011 = 2
DOW of 16 may 2011 = 2

Looks good!

Next time, we'll tie this all together. We have a function that calculates day of week for a given date, we already have figured out how to parse user input to get a desired day of week for a specified month/day pair, and we know how to figure out if the starting point for our backward date search is the current year (for example, whether we're past that point in the year already).

Calendar image via Shutterstock.com.

Dave Taylor has been hacking shell scripts on UNIX and Linux systems for a really long time. He's the author of Learning Unix for Mac OS X and Wicked Cool Shell Scripts. You can find him on Twitter as @DaveTaylor, and you can reach him through his tech Q&A site: Ask Dave Taylor.

Load Disqus comments