Barriers to the deployment of speech analytics

Barriers to the deployment of speech analytics

In order to select a speech analytics solution which is right for your organisation you would be wise to learn from the issues experienced from the solutions implemented by early adopters of first generation speech analytics technology, as some or all of these are still being promoted and sold by established vendors today. There are three key problem areas where customers are not realising the instant return they were expecting from their speech analytics investment:

Access to Call Recordings

Early speech analytics solutions relied upon having access to a contact centre’s voice recordings. They did not record the calls themselves and so integration with a call centre’s existing call recording equipment was a pre-requisite. This proved, in some cases, to be somewhat of an obstacle as the voice recording vendors and customer IT departments were very protective of their equipment, their existing contractual relationships and also the recordings themselves.

Security of voice recordings is a very hot issue, particularly in organisations where lack of compliance can lead to multi-million pound fines, so giving access to a third party was understandably making stakeholders nervous. What’s more the voice recording vendors were also developing their own speech analytics solutions and were obviously keen to prevent their customers from buying speech analytics from elsewhere.

Extensive Professional Services

The skills required to both set up and operate first generation speech analytics solutions are vastly under estimated. In order to get the speech analytics solution up and running a very substantial amount of bespoke ‘training’ of the system for each customer is required in order that the system can search for particular key words or combinations of key words, across the voice recording data.

In order to accommodate for different regional accents as well as languages, this ‘training’ is understandably extensive, requiring specialists who understand the underlying concepts of phonetics and language. Coupled with the system integration required with the customer’s voice recording equipment, this process could take weeks or months, and for many, a permanent internal or external skilled resource is required for the lifetime of the solution.

Therefore the initial, anticipated cost an organisation was quoted for purchasing the solution soon paled in comparison with the ongoing running and updating costs of the solution on a long term basis.

Accuracy and Search Rates

Such first generation speech analytics solutions can only perform analysis on calls which have been completed. They work on the basis of bulk processing of call recording data, and although this process has been speeded up with recent releases of next generation technology, this is still the case for the majority of solutions available today.

There is also a distinct difference in performance between speech-to-text solutions and phonetics based solutions. Speech-to-text based solutions take longer to deliver any results because the process requires the software to listen to the recording, understand every word and then convert the spoken word to text format before it can be analysed.

Currently between only 1 and 5% of all calls recorded are manually monitored and analysed, giving a customer organisation a brief insight only into what is actually happening within their call centre operation. Using first generation, bulk analysing solutions can increase this percentage but it will all be undertaken some time after calls have been completed. As a result, analysis reporting cannot physically identify or address all the areas where non-compliance or customer attrition is occurring in a timescale that enables mistakes to be rectified quickly.

Live Call Analysis

Being able to directly affect the outcome of each and every call as it is occurring, is considered the ultimate aim of call analytics, in order to increase profit per call, increase first call resolution rates and improve customer satisfaction. Because anything that can improve first call resolution has to be a winner!

Therefore the majority of next generation speech analytics vendors are claiming to sell live call analytics. However a word of caution must be said here. What the vast majority of vendors claim to be ‘real-time’ or ‘live’ is quite often an exaggeration of the truth.

The question you need to ask any vendor to identify whether their solution actually analyses calls live is:
‘Does your solution analyse conversations during the actual call and before either the customer or agent puts the phone down, and can I see a real-time picture of how my call agents are performing at any given moment in time?’
Many solutions can analyse a call some time after it has been completed, yet they still claim to be real time. However that cannot be truly considered ‘live’.

A solution that is truly live can be extremely powerful by applying the results of live speech analysis immediately in order to generate live agent prompting during a call. For example if your agents have to say a particular phrase on every call and they have forgotten to say it, a prompt will be delivered to their screen reminding them to say it before the call is completed. This enables the agent to return to their script and complete the call in full compliance with their legal or regulatory requirements. Thus first call resolution rates are significantly improved.

The applications of this new technology are limitless, from being able to prompt agents with up-selling opportunities, ensuring 100% compliance on every call, to correcting poor or negative language. Indeed by identifying bad phrases as they are being said, a supervisor can be flagged if an agent is having a particularly difficult call or number of calls on a particular day, and intervene to take them off calls or find ways to rectify their behaviour and language.

Accuracy Rates

Search and analysis accuracy rates still vary considerably across different speech analytics vendors. Speech-to-text solutions tend to have lower accuracy rates of between 40-60%. Phonetics based solutions produce better results, generally speaking around the 60-70% mark and the latest technology with a combined speech-to-text, key phrase and phonetics based system can deliver consistently accurate rates of between 90 and 97% on 100% of calls.