How does Speech Analytics work

Filed under - Technology

In order to obtain this greater understanding of the voice of the customer, you must first record the calland turn it into data; you then need to refine that data and, finally, analyse it.analyse it.

Stage 1 – Process audio into data.

Many of us will be familiar with speech or voice recognition from programs on our PCs. The speech analysis technology behind this capability is based on the program’s ability to recognise the 40 distinct phonemes that exist in any language.

This kind of speech analysis is called phonetic-only processing and it’s been around since the early 1950s, although it’s improved a lot since then!

Its advantage is that it’s quite quick (once it’s been ‘trained’ to
understand individual voices); but against that it doesn’t recognise context (so ‘cancel’ is essentially the same as ‘can sell’) and it doesn’t lend itself to quick and comprehensive searching.

There is another, more recent technology; it is based on phonetics but adds the dimension of matching every word to a 60,000-word dictionary. This is called a Large-Vocabulary Conversational SpeechRecognition engine, or LVCSR for short. LVCSR’s generic dictionary can be supplemented by additional industry, or company, specific terms. It’s slower than phonetic-only in its ability to turn audio into text,but in terms of analytics and searching, it’s head and shoulders above the phonetic-only approach.

Most commercial Speech Analytics use a combination of the two technologies.

Stage 2 – Refine the data.

This is the stage where you’ll find the most differentiation between Speech Analytics solutions.

Phonetics-only systems are limited to a list of terms pre-determined by the user. Most businesses will need more, and LVCSR’s ability to do deeper offers that.

Some systems will get deeper than others; for example, some will just count particular words and you’ll be able to analyse on the basis of frequency.

The best systems will use a mixture of LVCSR and Phonetics which means that you’ll be able to detect how often words are spoken and, most importantly, their relationship to other words.

Context is recognised. And context is all-important.

Using a Semantic Index will enable you to categorise calls and provide root-cause analysis for each category. You can compare subsets of calls against other calls. You can even look for frequency and correlation together – so that the word ‘cancel’ in proximity to a competitor’s name will be flagged, for example.

Stage 3 – Data analysis.

This is where the value of the previous two stages pays off.

At the bottom end of the scale is keyword spotting – and that can help you look for particular wordsor phrases.

More sophisticated solutions based on LVCSR will enable the ability to use call content to classify large volumes of calls, such as customer complaints, billing issues, product feedback andrepeat calls.

A Semantic Index, uses advanced data mining to find out why customers are calling. It can automatically analysecategorised calls and uncover issues, but more importantly, user definition isn’t required. Call review is easier and quicker, and on goinganalysis is dynamic.


  • Debbie Hage of Verint
Author: Jonty Pearce

Published On: 14th Mar 2010 - Last modified: 19th Sep 2019
Read more about - Technology

Follow Us on LinkedIn

Recommended Articles

Wooden cubes with speech bubbles linked to each other with lines.
Speech Analytics 101: What Is Speech Analytics?
Word Spotting vs. Phonetic Search vs Speech Recognition
Using Speech Analytics to Assess Language Proficiency
Stationery, chalkboard and UK flag on color background with words
UK Phonetic Alphabet - Free Download