IT people spend a lot of time and effort to avoid outages. To do so we buy on redundant everything, run backups and test Disaster Recovery (DR) plans, and follow the change control precesses.
I was once in a meeting with an IT infrastructure manager when they explained why they wanted to avoid a repeat of a recent outage. He had been called into a meeting with the company CEO and CFO and was asked straight out, does this outage mean we have a Sarbanes-Oxley compliance problem? For a large company with a significant US presence, this was a very serious issue.
What was the outage that that attracted the attention of these senior executives?
To put in one way, a file server had crashed.
To put it another, neither the office of the CEO or CFO could access any of their files, including the company’s annual report, and the CEO’s update to the market speech. This was on the morning of the market announcement.
The incident review had revealed that …
- Backup were kept for only a few months
- Capacity management processes were poor, or not followed
- The DR plan hadn’t been tested for 18 months
- The DR system couldn’t cope with the production workload
Right now many of you are thinking what a bad job the company’s IT staff had been doing, but the strange thing was that this was a company that took disaster recovery seriously and conducted annual DR tests.
So what happened?
In my opinion this is a case of misalignment between IT and their users. File shares and home directories were simply not considered critical. After all, in a large company if one person can’t work, there is someone who can cover it. But what if an entire team can’t work? What if your business lost an entire function, like payroll or sales? The department budget is almost always calculated in a spreadsheet before being entered into the company ERP. Those legal documents are written in a word processor before before being uploaded to the company website. And that presentation for your companies biggest customer?
It’s easy to say in hindsight that the IT department should have known better, but ultimately there was a gap between the user’s expectations and what IT was trying to deliver.
A case of IT doing a good job delivering the wrong thing.