Unboxing Day
As much as I love working with Linux and configuring software, one major part of being a sysadmin that always has appealed to me is working with actual hardware. There's something about working with tangible, physical servers that gives my job an extra dimension and grounds it from what might otherwise be a completely abstract job even further disconnected from reality. On top of all that, when you get a large shipment of servers, and you view the servers at your company as your servers, there is a similar anticipation and excitement when you open a server box as when you open Christmas presents at home.
This story so happens to start during the Christmas season. We had just received our first shipment of a completely new blade infrastructure that we were really excited to try out. As the resident server monkey and general minion working under Bill's iron fist, I was to meet up with an engineer from our vendor at the data center and assist with the installation in any way I could. It was a big job—two completely populated blade chassis comprising 32 blade servers, integrated SAN switches and all the assorted power supplies and network pass-throughs that went along with it. We budgeted a full day of the engineer's time to rack the new chassis, slot the blades and make sure all hardware was functional and up to date.
[Bill: Iron fist? I like the sound of that. Reminds me of a mid-1970s Marvel Superhero...but I digress. I remember this occasion. We'd just finished piloting a VMware/Blade infrastructure at our corporate office, and we were about to roll it out to our production data center, on next-generation hardware. It was an exciting time!]
I arrived at the data center a few hours before the engineer so I could get all the boxes from shipping and receiving and move them into our cage. If you ever have ordered a blade chassis, you know that everything arrives in these gigantic cardboard boxes that incidentally were about the size of our shared cubicle space back at the office. These boxes open up to smaller boxes for the blade servers, chassis, power supplies and the rest. At first things moved smoothly. I broke down the first set of boxes, and after a number of trips, the empty blade chassis and the blades themselves were stacked neatly near our cage.
The Jack in the Box
It wasn't until I opened the last box that I realized I was in trouble. Instead of containing a dozen large boxes and mostly empty space, this box looked like a failed game of Tetris. It was filled to the brim with hundreds of tiny boxes of all shapes and sizes. The engineer was going to be there soon, so I tried to organize the boxes into different piles and then filled my pushcart with swaying stacks of tiny boxes and made trip after trip to the colocation cage until all of its walls looked like the inside of a brown-brick house.
[Bill: I seem to remember you sent a couple e-mails to me along the lines of "Wow, this new stuff sure has a lot of boxes compared to the old stuff."]
This is probably a good point in the story to tell you that up to this time, we normally had taken advantage of our vendor's integration service. We standardized on servers with a certain amount of RAM, CPU revision, storage and network configuration that deviated from the base model, so our vendor would take the base order model and do the work to add CPUs, RAM and the extra parts we wanted so that when we got a server, we could just rack it and turn it on.
In this case, for some reason, we failed to request this integration service, so not only was I looking at the boxes for blades, chassis and power supplies, I had hard drives, CPUs, RAM, fiber-channel HBAs, extra NICs and even battery-backed write caches all individually wrapped in their own boxes. Instead of unboxing a blade and sliding it into its slot to install it, every single blade would need to be opened, and then each and every component would have to be opened, removed from its static wrap, and installed into the blade one by one.
[Bill: I have to say here that up until this point, the vendor always had "thrown in" the integration service for us, and it's something we (and by we, I mean me) had taken for granted...until the day my boss called the vendor and deleted this "superfluous service" from the quote, without telling me.]
Drowning in Cardboard
When the engineer arrived, I explained the situation, and we both realized we had a long day ahead of us. At the beginning, we made great progress. He opened up and racked the chassis and power supplies until the point that we were ready to install the first blade server. At that point, we agreed on an assembly-line system where he would open up a blade, and I, like a surgical assistant, would unwrap and hand him each component in a certain order so he could install it. Then, while he finished up the blade, I would fill up the cart with empty wrappers and boxes and roll it to the trash area, so we didn't drown in anti-static wrap and cardboard. After a full-day's work, we were able to integrate 20 out of our 32 blades successfully.
Unfortunately, we had booked the engineer only for one day, but he was able to shuffle appointments around and return on Friday morning of that week to finish up. Halfway through Friday morning, we were able to finish with the blade servers so that they all were racked. We were ready to be done at that point, but we were only halfway there. We still had to install all the hard drives, integrated network pass-throughs, fiber-channel switches and finally, upgrade the firmware.
[Bill: I had forgotten how long that job took. Now that you mention it though, there was an amazing amount of cardboard generated from that. It didn't help that the data center didn't allow cardboard on the data center floor, and you had to shuttle all that stuff back and forth.]
Late for the Party
Once we powered on the blades, it looked like we were close to the finish line. I started packing up all my things so I could head home early and get dressed up for our company's big Christmas party later that evening. Naturally, it was at this point that a few of the blades wouldn't power on. After minimal troubleshooting, we were left with just one misbehaving blade. The engineer started the hardware troubleshooting process as I watched the minutes tick by. I realized I had to somehow power through Bay Area traffic, get home, put on my suit and drive back through the traffic to the party, and rush hour was rapidly approaching. Ultimately, we had to open up the server, remove all of the hardware we had added only hours before, and insert the hardware one piece at a time until we identified a faulty DIMM slot. Finally, we were done and I was able to get to the party fashionably late.
I think the moral to this story is pretty clear. If we had only gotten all of our servers integrated ahead of time, the entire install would have taken a fraction of the time, and any hardware problems in the system would have been identified before anything was shipped to me. When you have the option, especially when it comes to large orders of servers, get all your components integrated ahead of time.
[Bill: The moral for me as a manager, is always to double-check the quote for services, and make sure that all of those are understood so they don't get labeled as non-essential and cut by people higher up the food chain. I'm usually not a fan of too many vendor services, but getting the entire system integrated by a vendor will accelerate deployment time by at least a couple days.]