Audio Mining

Speech analytics solutions for contact centres use techniques commonly referred to as audio mining, where large volumes of audio are searched for occurrences of specific words or phrases. Mining recorded customer interactions can provide valuable insight into products, services and processes to help reduce costs and improve customer satisfaction.

Get Insight from Captured Information

Even in the smallest of contact centres, the sheer number of recorded interactions can make analysing captured conversations very difficult. Automated speech analytics systems can search, analyse, and report on trends in recorded calls to show what’s happening in your contact centre.

The analytics software can isolate the words and phrases used most frequently within a given time period, as well as indicate whether usage is trending up or down, making it easy for supervisors, analysts, and others in the organisation to spot potential problems and take actions that can reduce call volumes, drive down costs, and increase customer satisfaction.

For example, a sudden escalation in the number of times customers use words such as “bill,” “plan,” or “charge,” might alert you to a potential problem in your billing process. You can redirect your resources to isolate the problem and take corrective action quickly.

The two most common approaches to audio mining are Large Vocabulary Continuous Speech Recognition (LVCSR) and phonetic recognition.

Large Vocabulary Continuous Speech Recognition

LVCSR depends upon a dictionary of words that it uses to understand what is being said. Pure LVCSR solutions usually need their generic dictionary expanded with industry- and company-specific words, but achieve high
levels of accuracy. Using the dictionary, the software recogniser scans the conversations and creates an index. The index is searchable and contains information about the words it understood in the recorded conversations.
The index can be quickly searched for key words and phrases, and only those conversations containing the key words and phrases are presented.

LVCSR solutions often provide higher relevant contact rates and present fewer false positives. But there’s a price for the higher accuracy — speed. By some estimates, LVCSR systems ingest recorded speech at only about
two to three times faster than real time. However, the index in a LVCSR solution can be searched very quickly. Search speed is important, because finding and retrieving relevant recorded conversations is the essence of
the solution.

Unfortunately, LVCSR solutions don’t easily accommodate new avenues of investigation. If, for example, you launched a new product and wanted to assess the impact of differing marketing messages on the buy and response rate, you would be faced with having to add that product name to the dictionary, and then having to reprocess the recorded calls across the study period. Moreover, if a particular word isn’t in the dictionary when LVCSR builds the index, it cannot be found in the recordings unless they are reprocessed.

Phonetic Recognition

In contrast, phonetics recognition software doesn’t understand any words at all. It understands phonemes, the bits of speech — sounds — that make up words and language. It is much faster than LVCSR. Phonetic recognition software can ingest recorded conversations at 10 to 15 times the speed of real time. On the other hand, searching a phonetic-based index is painfully slow.

Phonetics recognition solutions can readily search for new words and phrases, since they do not attach meaning to words. But therein lies an essential problem with phonetics solutions — recognition error. The problem has some ripple effects that amplify the impact. For example, the English language is replete with:

  • Homonyms – words that sound alike and are spelled alike but have different meanings, such as stalk, bear, or left.
  • Homographs – words that are spelled alike, but have different meanings and, sometimes, different pronunciations, such as abstract, address, or does.
  • Homophones – words that sound alike but have different meanings, such as buy, bye, and by.

From a phonetic viewpoint, these words are the same. For these reasons, phonetic solutions, while continually improving, have higher recognition error rates. Consequently, they can exclude or include calls inappropriately in the search results, biasing the composition of the search results and creating further inaccuracies when the calls are categorised and analysed for potential root causes.

Hybrid Solutions

Next-generation speech analytics solutions employ elements of both technologies, in effect leveraging the best characteristics of each and minimising the shortcomings. Available solutions achieve high ingestion rates with high recognition rates. For example, Verint Witness Actionable Solutions’ Impact 360 Speech Analytics solution uses a very large dictionary, approaching 100 kilobytes, and rapid indexing. Coupled with an assist from phonetic analysis, the solution avoids the need for reprocessing of recorded calls, as would be required in pure LVCSR. With better, more accurate search results and better comprehension of what is being said, search results are of higher quality. Deeper comprehension of the words and phrases spoken permit sharper categorisation and better analysis of possible root-cause elements.

How Does it Actually Work?

The process begins with speech analytics engine ingesting a large number of recorded conversations. Using the audio mining and indexing technologies described above, the solution recognizes words within this large volume of unstructured information and organizes them into user-created and self-suggesting categories.

The software can accomplish this because it “understands” the content. For instance it might sort the recordings into three categories: (1) customer complaint calls, (2) calls in which a new product offering is mentioned, and (3) calls in which a competitor is named during the conversation. The solution drills deeper into each category and identifies clusters of calls with commonalities that suggest a root cause. It’s this part of the solution that makes the biggest business difference. The key factor for success in terms of business impact lies in the quality and depth of the searchable index and the precision of the categorisation and root-cause identification routines.

Case Study

The potential impact of the rigorous and continuous utilisation of speech analytics by the contact centre can be considerable. For example, an insurance company measured its first-call resolution rate at 60 percent, a number well below its expectations. The company used Impact 360 Speech Analytics from Verint Witness Actionable Solutions to mine a very large volume of recorded calls. Mining the 40 percent of calls in which no first-call resolution was achieved, the solution identified a high occurrence of calls that contained the phrases, “I don’t know”, “I need to check with my supervisor”, “…calling back about my claim”, and “…waiting for a claim form”.

Further analysis surfaced probable root causes for each. The “I don’t know” conversations revealed agent knowledge gaps that could be filled easily with coaching and eLearning using “learning clips” highlighting best-practice examples from recorded interactions. The “I need to check with my supervisor” conversations revealed a lack of agent empowerment embedded in the centre’s processes. The “…calling back about my claim” recordings revealed processing issues outside the contact centre, as did the “…waiting for a claim form” analysis.

With these powerful insights, the insurance company improved its first-call resolution by 25 percent and enjoyed a number of ancillary benefits, including a substantial reduction in average speed of answer, reduction in average handle time, better staff morale, and the avoidance of hiring 22 additional agents.

Further Reading


  • Rob Wint of Verint

Published On: 14th Mar 2010 - Last modified: 26th Feb 2019
Read more about - Archived Content

Get the latest exciting call centre reports, specialist whitepapers, interesting case-studies and industry events straight to your inbox.