Contact centres have given a big thumbs-down to speech recognition, according to an IVR survey carried out by Call Centre Helper.
Only 18% of contact centres (that fronted their calls with an Interactive Voice Response (IVR) system) used it in combination with speech recognition.
The IVR survey was carried out on the Call Centre Helper website in February 2012 and had a total of 425 respondees.
This seems to tie in with the latest figures kindly provided by Steve Morrell of ContactBabel from the latest Contact Centre Decision-Makers’ Guide, showing an average of 10% of contact centres using speech recognition.
|Contact centre size||Touchtone IVR||Speech recognition|
The big problem seems to be the overall accuracy – particularly with regional accents.
We asked the users how well they thought the speech recognition software performed.
The range of responses varied from “90% accurate but it does struggle with some accents” down to as low as 30%. The average seemed to be in the 70% area.
Recognition with different regional accents is a real problem cited by many of the people that responded. The problem appears particularly problematic when used in countries like South Africa, where the different accents complicate the recognition process and people can be less comfortable with using voice-based technology.
Most speech recognition vendors tend to shy away from publishing speech recognition accuracy rates, but in discussion figures between 90% and 98% are routinely bandied around. Occasionally real-life figures are published. In an article on tmcnet, Jeff Foley, a former senior marketing manager at Nuance, is reported as saying that “the perceived accuracy of those systems—the accuracy that your callers experience—is only 70 percent.”
The challenge is that there is a wide discrepancy between the recognition rate for a single word in a controlled environment and that obtained in normal day-to-day life – particularly when there is background noise.
One contact centre consultant told me that “out-of-the-box accuracy” was typically only around 50%, but that with careful tuning you could get it up to around 90%.
One of the biggest problems with accuracy seems not to be the recognition accuracy, but a word that is not in your list. These “out of grammar” errors can be in the 15% range.
This accuracy problem has been particularly brought to life by Richard Wilson’s documentary “On Hold”
Perhaps with real-life accuracy of around 70% it’s not surprising that speech recognition has been slow to take off.
So what does the future hold?
It seems likely that the use of smartphone apps will become more widespread for contact centre transactions. Routine transactions could be handled on the smartphone and telephone calls could be routed directly to the best team to handle the enquiry.
Jonty Pearce is Editor of Call Centre Helper.
Though I no longer work at Nuance, I felt compelled to say something. You’ve quoted me out of context by implying that 70% accuracy is a real figure that “slipped out” somehow.
In the full article, you’ll see that my comment was clarifying the accuracy rate for a poorly tuned speech recognition system that has an abnormally high “out of grammar rate” of 30%, meaning that callers are saying things which the system hasn’t been programmed to recognize. Its accuracy rate can still be in the high 90’s, but if people are saying things not on the list of recognized phrases, then it doesn’t matter how accurate the system is.
The point I made, and that I still stand by, is that natural language systems make these systems much more accurate and more likely to get a caller to the right place.
Uninformed contact centers may continue to dismiss speech recognition, but they don’t offer an alternative. Many touchtone systems are cumbersome and frustrating, yet companies aren’t able or willing to pay for live agents to answer every call. Good automation is a requirement for today’s over the phone contact center systems, and the companies willing to invest in what it takes to get that good automation can easily pay back their investment in improved average handle times, higher customer satisfaction, and a better reputation that helps their overall business.
Thanks for the comment. I was trying not to take things out of context, which is why we included the link back to the source article.
The trouble is that there is a big difference between the figures that are often bandied about by over-eager sales people and the real life figures that out readers have experienced.
The 70% perceived-accuracy figure, does seem to correlate well with the average accuracy figure that our readers have experienced.
In the article we have highlighted two examples of where people have achieved 90% recognition rates, but this does not seem to be the norm.
A purely personal comment as while I have set up and managed a number of multilingual technical helpdesks we have never employed voice recognition system. In part we thought that these systems would never cope with the wide variety of accents that our customers have, coming as they do from as far afield as Sweden, Israel and Italy and more recently the Far East. However, I consistently find that when I call a company which is using this software; typically a credit card company, insurer or utility company my calls always fail even with simple “Yes” and “No” answers and frequently here is no option to switch to a touchtone system. All very frustrating when I regard myself as speaking excellent Northern Standard English!
What is the accuracy of touch tone IVR’s?
Is it around 90%? I sincerely doubt it.
The real evalution is overall user opinion.
I can hardly imagine that most users would opt for a touch-tone IVR system compared to well engineered speech dialogues in case of a complex application.
The problem is that one should compare apples to apples not to peaches.
A touch-tone detector my be accurate up to 99.999% for 16 different characters at most. If that selection quantity is sufficient there is little room for speech recognition.
In more complex situations speech dialogues may help but only if proper user focus can be reached. In a large volume application even 50% recognition accuracy may be sufficient. The rest may be handled by fall-back to a human agent after one false speech reco. That is acceptable in most situations. The transaction rate may be improved by continuous fine tuning quite a lot.
The benefit for the company is lower personnal cost and faster transaction for the end-users.
It is worthwile to mention that outsourced human agents may be even more boring than automated speech dialogue systems. A friend of mine definitely denied phone interaction with a major company because of poor call center agent performance.
If you use speech technology at the level it can really perform it can be of major benefit for all.
The problem is that people like simplified things: marketing promises too much, app. developers want out-of-the-box solutions, operators do not want to spend on daily improvements, quality assurance. Customers want everything for (nearly) free. Very few are willing to take into account accent (or even language) dependence.
We should all understand to be more realistic and take all posible important parameters into account.
The big area of difference is how to use Speech within an IVR.
For call control to an agent, then you must have an extremely complex business to need any more options then touch tone can provide.
For Self Service however, speech is a natural fit as you can capture more results then DTMF could ever provide. What I see is that businesses want to automate the more complex transactions rather then the short, simple transactions. The more complex the solution, the more difficult the Speech tuning needs to be and this leads to lower recognition rates. We all know this leads to lower customer satisfaction and feeds the concerns about using Speech technology.
I’m not surprised at the low overall usage of Speech (despite what many sales reps will have you believe) and I think this usage will continue to decline. Phone apps are the new direction and can handle more complex self service more gracefully for a much smaller cost and easier to update.
I still think most callers are still far more comfortable dialling options into the keypad. Voice recognition is coming on in leaps and bounds but it still can’t compete with a simpe “dial 1 for sales, 2 for existing client” style