Great Expectations

July 9th, 2010, by Scott Kantner

What could possibly be more fun than standing outside in the 100+ degree heat here in southeastern PA?  Standing outside in 90+ degree heat while waiting in line at Disney World.  At least there’s the promise of something fun, and possibly something cool and wet at the end of the wait.

While recently wading through the sea of humanity and waiting in some of the infernal lines that define Disney at this time of year, I was struck by an interesting IT analogy early in the week (yes, I really did need a vacation, and by the end of the week I wasn’t thinking about IT at all).

In last June and early July, the number of baby-strollers per square foot in the Magic Kingdom increases to approximately 10x the normal rate.  This forces one to put up with gives one many opportunities to observe other people’s children under extreme conditions. It’s amazing to watch parents expect their 5-year olds to behave like perfect angels in subtropical queue lines for upwards of 45 minutes, or sprint from one end of a park to another on tiny, tired little legs to score a Toy Story Fast-Pass before they’re all gone. It’s amazing because you can tell these kids are perfect hellions under ideal conditions as well. Putting them under stress only intensifies the problems that already exist.

Likewise, if you’ve got poorly designed or neglected infrastructure, simply moving it to a colo facility isn’t going to improve up-time or performance significantly, if at all. Certainly you can improve environmentals, save capex, and get lower network latency with a colo move, but if application response time and reliability are sucking wind before the move because of bad design or sysadmin neglect, not much is going to change.

My point isn’t that you should avoid putting your infrastructure in a better home if you need to, but that you shouldn’t expect it to behave any differently just because you moved it. Moreover, move time is not the time to make drastic changes to your production systems. It’s not a “free” outage window.  The more changes you make during a move, the higher the risk of a failed, or at minimum a very stressful move.

On the other hand, a move can be an ideal time to upgrade to better hardware and legitimately raise your expectations. For example, you can set up new hardware next to your old, cluster it, and then move the new half of the cluster to a better home while the old half continues to run the business. After you complete the move and let the clusters resynchronize, you can turn down the old cluster and all activity will automatically switch over to the new hardware. Your users will never feel a thing. Very little pain, but very much gain.

Of course that all sounds good, and there are a lot of details involved in making it happen, but that’s what we do best. If you’re interested  in smoothly moving your critical IT gear to a new home and need some experienced help to get it done, give us call. Hardware prone to temper tantrums is one of our specialties.

//spk

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post

Keep The Change

June 23rd, 2010, by Scott Kantner

Does this sound like your IT shop?  Reports from the Uptime Institute consistently show that the majority of reliability and uptime woes aren’t caused by hardware,  facilities, or utility failure – they’re caused by humans, and what pray tell are those humans doing?  They’re changing things, and often too much of the change isn’t planned, approved, or documented.  Or, there is simply too much change going on at one time.

Much like a bomb is meant to explode, technicians are meant to be technical, so it’s a bit unrealistic to assume they’re giving a lot of thought to managing change, much less be fond of doing so. They just want to git ‘er done, and in large part, we pay them well to not only do that, but to do it right the first time.  Hard core techies, the ones that really know how to make things work, typically aren’t also wired for sitting in management meetings. The problem with managing change is that it’s boring. It’s not technical. And explaining highly technical things to non-technical folks in a change management meeting is not always the average techie’s strong suite, nor perhaps the best use of their time. To the contrary, it can be a very frustrating experience for them, which can lead them down the Dark Side of making changes beneath the radar. Effective change management therefore becomes a bit of a balancing act. We need to know what’s going on, but we don’t want to bog everyone down in the process.

In our data center controlling change is not optional. Reliability demands it, as do the Spanish Inquisition SAS 70 auditors. But we’ve found a way to manage it without terribly burdening our technical staff. Change requests may be formally entered in the system by any authorized individual whether or not they are technical;  they are simply the person requesting the change. The request is then routed to a technician who can assess what needs to be done, adds those details to the request, makes a suggestion as to when it might be done, and then it’s passed on to someone in management who can assess the risk and approve/disapprove it. If a change is of major significance, the request comes before a Change Advisory Board (CAB) for final approval. Technicians, while welcome, are not required to attend CAB meetings.  When requests are properly documented, the CAB is almost always able to make a good decision without further involving the technical staff.  When the CAB does need more information or defers a  request for some reason (e.g. too many changes on one night), the technician in question is notified and it’s handled outside of a meeting.  This saves time, money, and mental fatigue. Since the pain threshold is relatively low, this method also encourages all change activity to actually be run through the proper channels.

Our process is capable of handling very high rates of change, but that doesn’t mean that we do so.  On the contrary, we try to minimize the rate of change, batching things together when it makes sense to  minimize outages, and spreading them out when the risk is high to maximize uptime.

Managing change is not fun, and you may be justifiably weary of it.  Let us take that burden off of your shoulders.

//spk

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post

The Quiet Before The Move

May 21st, 2010, by Scott Kantner

In the eyes of a fifth-grader, here is what we do:

Indeed, we are designed to always stay running. In less than 6 hours we will begin moving a live production environment of nearly 120 servers from their current home approximately 60 miles away into new racks in our facility. Those servers are coming here in part because keeping up with the always-on demands of today’s infrastructures is increasing more difficult and expensive to do.  It has come to the point where if IT infrastructure is not your core business, it’s awfully difficult to justify the ever rising costs of running an always-on environment yourself.

We’ve spent months getting ready for this move, and now in these hours before “go time”, it is eerily quiet in that part of the data center. Other than the hum of CRAC units and core switch fans, not a creature is stirring.  The patch cables are sorted in boxes with care, in hopes that people with servers soon will be there (sorry, couldn’t resist).  But once completed, the large precisely choreographed effort required to move our new customer into a better home will result in always-on uptime at  a fraction of what it would cost to build from scratch or remodel.

Since all of the data center details for this move are already cared for, it’s now time to make sure the most important item is covered:  three shifts-worth of fine coffee.

//spk

Post to Twitter Tweet This Post to Delicious Delicious Post to Digg Digg This Post Post to StumbleUpon Stumble This Post


Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.