3 Potential Pitfalls of DIY Speech Analytics

CallMiner Forrester CX Tech Ecosystem Box

Rick Britt of CallMiner discusses three key problems that you will face if you decide to implement a DIY speech analytics system.

More and more organisations are looking to build in-house data science or AI teams to use emerging technology and techniques to harness the power of their data. With the growth of these internal data science teams, many companies are looking to gain greater control of all aspects their data programs to be more nimble and effective.

If done correctly, this also provides more opportunities for creativity and experimentation with internal and external data. We scientists can bring a new level of insight to organisations. Turns out having scientists around is kind of cool.

Please, don’t let the dazzle of our NASA space camp t-shirts, utter domination in quoting dystopian novels, and late-night sci-fi board game parties fool you. Organisations should be cautious when contemplating taking on data science projects that are not core to their business.

It is important to understand who we are. Over the past few years there has been an increased desire for organisations to deploy in-house data scientists to projects into areas outside of their core competence.

There is an allure here. Building new metadata, customisable intelligence like name, area, product, internally on a company’s highly valuable data can provide new intellectual property and possibly competitive advantage. With all this upside, what’s stopping you?

Before tackling any DIY speech analytics data project, organisations should endeavour to know the full scope of the project. When it comes to complicated data projects like a speech analytics program, many organisations don’t fully realise the complexity until they are highly invested, and often are left footing a large bill with a sub-optimal outcome.

Let’s look at why it’s attractive on the surface to undertake DIY speech analytics and the pitfalls companies we work with have encountered when they have tried.

Natural Language Processing (NLP) Is Cool

I am fortunate to work with a team of scientists in the cutting-edge field of AI, and very fortunate to do so for a leading company. I can tell you conversational AI and NLP are some of the coolest and most avant-garde AI research fields out there. I am talking self-driving car AI cool.

If we forget resources and cost for a moment, the advances in NLP and commoditisation of speech recognition transcribers, married with the power of current deep learning technologies, make this a lower cost of entry to a cutting-edge research field.

As a data scientist, it is cost efficient to try, easy to get prioritised internally, seemingly logical to a business, and cool AF. That’s a lot of wins.

So, What Is the Down Side?

Recently, a client’s data scientist ran a recorded call with their significant other through a free online transcriber. The recognition accuracy was in the low 90%s, ergo the transcript was very clean. The client said their data scientist was sure he could build a better speech analytics system, at a lower cost than what we have been perfecting over the last 15 years.

We have deeply analysed what it takes to arrive at parity with our technology. Let’s walk through a few of the foundational hurdles of what it takes to build with something that basically works, which is still not close to where our software is today.

Problem 1: Transcription Speed and Accuracy

Don’t be fooled by one-to-one audio transcription rates of high-quality audio. Speech recognition software has come a long way in just the past few years (see Moore’s Law) and there are lots of options, even free ones, that produce acceptable transcription – but not at scale.

Many technologies offer great results for one-for-one transcription: one call transcribed per CPU time, ergo a five-minute call takes five minutes to process, and upon completion, the next one starts. There is a trade-off between speed and accuracy. Speech recognition software must deal with this, at scale, ergo high processing speeds.

Speech analytics platforms like CallMiner have algorithms that will contextualise with speed and endeavour to pick the next logical word, quickly. Your data scientist will need to deal with that.

Problem 2: Finding Something Relevant in the Transcript

Once you find a good solution for transcription, the next step is to start finding the pieces of information in the transcript that may have a bearing on the business. You can build an algorithm to search for specific words, but this practice of “word spotting” does little more than show you singular instances of things. Data anecdotes. Good data scientists will learn very quickly two daunting truths; they are like natural laws of speech analytics.

Algorithms need to be built not only to spot a word or phrase but also to identify any of its aliases (ex: loan/lone/alone) and where they fall in the conversation (ex: before “payment”). Even at exceptionally high 95% accuracy with just a million words, that is fifty thousand incorrect words that need to be dealt with.

The ability to build phrases together to create scoring also should be contemplated. As a basis for prediction, how relevant is the thing you found? A phrase by itself may not tell you much, but relevancy scores and counts of the same topic may be an indication of a significant change in customer behaviour on that type of call.

Problem 3: Anomalies, Quirks and Tiny Black Holes

So much of a conversation is not what you hear, not what makes sense, but what you don’t or doesn’t. Without a deep set of experiences (relevant data), finding the anomalies or missing things is nearly impossible.

Let me share some examples that a data scientist who is new to this world needs to figure out. These are all real, and all, sadly, very common among clients.

Did you know that the candy “Tootsie Rolls”, among others, has a hotline, and that hotline has no required prompts, so it is effectively an endless loop. This is important if an agent who is not really working hard wants to take an unscheduled break. Just dial that number and sit there looking busy.

That is something an organisation may want to find in a speech analytics system. Or agents listening to phone rings for 5 minutes, or listening to 10 minutes of an answering machine, or an internal extension listening to hold music for half an hour.

Even more diabolical is silence. Is silence good or bad? That depends, that tiny black hole in an audio recording is speech analytics gold, highly important, and not trivial data science either.

Go/No-Go time

As we all know, anything “data and analytics” related will take dedicated time and effort by some very specialised resources, typically in high demand.

Certainly, a big data project like speech analytics should not be expected to be completed as a side project in any reasonable amount of time.

Companies need to be willing to commit specialised FTE hours on an ongoing basis to ensure program success and should weigh the opportunity cost of such a venture.

This blog post has been re-published by kind permission of CallMiner – View the Original Article

For more information about CallMiner - visit the CallMiner Website

About CallMiner

CallMiner is the leading cloud-based customer interaction analytics solution for extracting business intelligence and improving agent performance across all contact channels.

Find out more about CallMiner

Call Centre Helper is not responsible for the content of these guest blog posts. The opinions expressed in this article are those of the author, and do not necessarily reflect those of Call Centre Helper.

Author: CallMiner

Published On: 17th May 2019 - Last modified: 21st May 2019
Read more about - Guest Blogs, CallMiner