Quick question for UK call centre professionals: who remembers where they were on the morning of 29th March 2004? No? Ok, we’ll come back to that one in a little while…
3rd October 2008: Quintana Roo, Mexico. It’s now 100 minutes into a planned 150-minute cave dive at the Grand Cenote. I’m on the return leg of a deep penetration cave dive, 50 minutes from being able to surface, when there’s a ‘pop’ from behind me and then the sound of gas leaving the tanks at an expeditious rate!
What do the two dates and incidents above have in common? Well, as far as incident management goes they both fall into ‘emergency response management’, rather than ‘business resilience’, ‘disaster recovery’, ‘business continuity’, or whatever your department happens to be called.
In this short article I propose to examine not the large-scale preparations, but rather what to do when it all hits the proverbial fan.
So, does anyone in the UK contact centre industry find the 29th March 2004 strangely familiar?
Especially anyone in the North West?
No? Are you sure?
That was the day Manchester and the surrounding area decided, with the assistance of a fire in a fibre optic tunnel in Manchester city centre, to isolate itself from the rest of the world. In a lot of cases this meant normal DR (Disaster Recovery) procedures didn’t work, given that most DR sites are geared toward one, maybe two, clients at once, not their entire client base descending on the site at the same time. This is the kind of incident where you need to be ‘creative’.
Incidents can be graded for severity and criticality, so let’s assume for the purposes of this article that: the building is not on fire, you don’t have ninety per cent of your staff suffering from Otter flu and the coffee machine is still working. I would guess my gauging of criticality and severity may differ from yours, opinions vary!
System failure is common
System failure is common, it happens far more regularly than we like to admit, even to ourselves. At the basic level the response is obviously to take a number and promise a call-back, but how simple is it to move quickly to a temporary paper-based process?
Can you offer the caller the option for you to take a few details and then call back with an answer rather than just to complete the entire call? Chances are in this situation your call length will be shorter anyway, but the extra few seconds can foster the impression that you are going that little bit further to assist when having problems already, there’s no doubt about that.
Sorry our systems are down
“Sorry our systems are down, can I take a number and someone will call you back” is easier. Great for reducing AHT; no practical use to man nor beast. Use the downtime constructively to make the transition back to normal easier rather than just treating it as a problem, advising customers so and arranging call-backs.
Paper processes have their uses
Paper-based processes have their uses – there I’ve said it!
I now fully expect an angry mob of contact centre technology suppliers and consultants breaking down the door with pitchforks and flaming torches, screaming, “Luddite!”. But if it is the difference between offering some service and no service, which would you prefer? and which would your customers?
It doesn’t take much to appear incompetent. However, the converse, seeming to go above and beyond in the face of adversity, is also fairly easy to achieve.
“I’m sorry our systems are down. However, can I take some details of the issue/complaint/order, I’ll deal with it, then get someone to call you back later to discuss it/take your credit card details… when we’re back up and running.”
Design your plans in advance
Something of this ilk, run on a paper sheet, designed in advance, not just agent notes, can be the difference between a happy customer and whilst maybe not a complaint, at least a ‘fail’ as far as the customer is concerned. I could at this point give some appropriate psychobabble regarding cumulative client impressions and propensity to switch suppliers when faced with multiple minor fails over time. Aren’t you glad I’m not going to!
DR seems to consist of planning for absolute worst-case scenarios, we have ‘grab bags’ and ‘on-site DR boxes’ containing copies of everything and anything that may be required, which in most cases is never required. It is, prudently, geared to total failure.
A quick checklist
On the other hand, in your case have you ever…
- Tested the system fully under operational conditions? A stress test out of hours including switching phone systems over is adequate, but you cannot test individual bits one at a time. It’s all or nothing!
- Given any thought to partial failures? Loss of just one critical component, as in the case of the Manchester comms fire?
- Looked at the cost of instigating full DR for a partial failure? Hint: be sitting down when you work out the price for this one!
- Given any thought to what happens if your DR site is unavailable? It was the case during the Manchester fire that DR sites filled on a first-come, first-served basis, leaving people who had paid for DR availability out in the cold.
Partial failures are common
Partial failures and the ubiquitous ‘system problems’ are the root of over ninety per cent of our day-to-day problems. However, our recovery procedures all look at the other, and far more unlikely, alternative of full failure.
Not to make light of it or to denigrate the ‘Recovery’ process, surely the more likely minor failures deserve some kind of attention? That said, what can you do?
Planning activities to consider
I’m going to make the assumption that you’re not in the business of throwing money away and that any kind of planning activities need to be minimal in cost. So, where to start?
- Initially look at the existing plan and isolate the different areas covered.
- Within these areas again isolate the key customer-facing areas; the ones where a disruption of service is going to be noticeable to the client.
- Now look at the individual elements within these groupings and see which parts can have a temporary solution placed against them.
- Brown-paper these processes and look at the bare minimum data or technology you need to provide a basic level of service.
- Give this to the staff concerned with running the process on a day-to-day basis and let them tell you why it won’t work, what you’ve missed and how to do it properly.
I’ve recently had what in hindsight is a perfect example of what I’m trying to say…
A few weeks ago I was in hospital for a very minor operation. Now, I’m lucky enough to have private medical insurance provided so this was all dealt with quickly, bills being settled by my insurer, with one exception. A small bill, which I’d paid straight away, but I am now on my third written demand for payment. Now, I know I’ve paid, and apparently they know I’ve paid but they have attributed it to the wrong account and it will be sorted.
The problem is, I phoned on the 1st of the month, only to get the, obviously standard, line… “Can you call back tomorrow, all our systems are down, it always does this at the beginning of the month…”. This is a prime example of what I’m talking about. It’s a known issue in a customer-facing process that could, with minor changes, be turned into something practical, and, if not improve the customer experience at least not ruin it.
In the example above my first thought was ‘amateurs’. They know it’s happening, the staff expect it. Why are they announcing to the world that their system doesn’t work and, more importantly, why are they telling customers that they know it’s broken, they know when it’s going to break and are not going to do anything about it? That every month they are saying we will deliberately inconvenience our customers, we can’t even be bothered to take your number.
Don’t let a failure impact a customer
We are, after all, supposed to be a customer-centric trade; there is no reason why a failure on our part should impact the customer.
In the event of having to invoke the full DR procedure we can expect the customer to understand that we may not be able to deliver the full service they expect. However, in the case of minor issues can we not review and adapt to deliver some service? In the case of the telephony failure in Manchester we were fortunate enough to be able to quickly transfer the lines to our Guildford office, and, with the assistance of the company on the floor below, who used a different comms supplier, were able to throw a long extension lead out of the window, down a floor, to give us one working phone line. A MacGyver, I know, but it worked.
When training kicks in
On the other hand, in the case of the dive in Mexico, training kicks in, this means the DR/ER plan is mentally invoked. Drills are automatic and on this occasion I had the system shut down, fault isolated and air supply back in about 25 seconds. I will say my instructor wasn’t the most popular person in the world when I found out he’d set it up as a drill, but…
Isn’t that why we have plans, procedures and training?
DR – Disaster Recovery
ER – Emergency Room (in a hospital), sometimes known as Casualty
MacGyver (verb) – To use ingenuity to fix or remedy a problem using only the tools available at hand.
MacGyver (noun) – Someone who can regularly cobble together solutions to problems using only the tools available at hand.
Dave Appleby has been working as a planner, forecaster and analyst in the contact centre industry for the last 11 years, having been a chef in a previous life. Starting off working on the phones for the launch of a Grocery Home Shopping service, he has worked for a variety of in-house and outsource operations including Disneyland Paris, Seeboard, Giftaid, GM Finance and the Daily Telegraph. A keen diver (both instructor and cave diver), Dave is currently a senior analyst for a large UK insurance company.