Key Implementation Considerations for a Speech Solution

A picture of a call centre agent sat at desk in front of screen with speech bubbles

Wayne Ramprashad of Voci Technologies shares insights about considerations when introducing a speech solution.

Introducing a speech analytics solution can give your contact centre a competitive edge, and, when you’re ready to start planning that solution, automated speech recognition (ASR) technology is the most common place to start.

However, ASR is far from the only component of a complete speech solution. Audio, call orchestration, call metadata, and voice analysis are also vital pieces of the puzzle.

To help you prepare for implementing speech analytics in your contact centre, let’s take a closer look at what will require your attention.


To access audio, contact centres can either get call recordings out of a call recorder or tap directly into the telephony system. While getting recordings involves less development, it limits you to post-call insights (rather than real-time).

Additionally, call recorder companies charge expensive fees for audio access. Once you get it, the recording is often low quality, decreasing the insights it contains.

On the flip side, tapping into a telephony system is a higher effort undertaking. However, you can gain quality audio for more advanced analytics, avoid recording vendor fees, and enable direct-to-transcript (DTT) solutions that give agents and managers real-time guidance and insights as calls occur.

DTT solutions can also simplify your entire speech infrastructure – streamlining call access, call capture, call orchestration, and speech-to-text into one united process and then sending the transcripts straight to your analytics application.


Call orchestration is the process of queuing and ingesting audio into an ASR engine. This process is unique for every system, and any post-call transcription will require creating custom code to send audio to the ASR engine via an API.

One common method for post-call audio ingestion is to write audio files to a directory and create software that uploads new files added to that directory to the ASR engine. Alternatively, you may connect to the call recording system API to extract and send the audio to the ASR engine.

Automated Speech Recognition

Contact centre ASR technology has come a long way in recent years. Today, best-in-class ASR technology combines low latency and fast transcription speeds with robust features and flexible deployment and integration options.

For product completeness, ASR technology should offer real-time and post-call transcription with punctuation, capitalization, number formatting, metadata, security features, multiple languages supported, and custom tuning.

Tuning should be used to increase word recognition accuracy, particularly for product, brand, and industry terms. Tuning can also help you better assess call and agent sentiment and adjust what is redacted for compliance and security.

To reduce your time spent on custom tuning, some ASR engines offer built-in language models for specific industries and applications.

When comparing ASR engines, look for a vendor with a record of improving speech-to-text accuracy, lowering costs, adding new language models, and enhancing ease of use over time.

Call Metadata

Creating a voice-of-the-customer speech analytics tool requires utilizing call metadata such as agent name, agent team, and caller number. (This metadata is distinct from any metadata that your ASR engine provides on things like gender, sentiment, and emotion, for example.)

To link call metadata with call transcripts, contact centres must first determine how to extract call metadata from the computer telephony integration (CTI) system.

One possibility is that your ASR engine or analytics software will offer a solution that meets your needs and works with your CTI system.

If not, you will have to build a custom extract, transform, load (ETL) system, using an internal database to relate each call recording or transcript with its appropriate metadata.

Voice Analysis

Some ASR engines provide a call analyzer for organizing, filtering, searching, classifying, visualizing, and reporting on call data. Using a call analyzer, you can search transcripts for specific words or phrases as well as automatically tag calls based on designated criteria.

You can also view statistics and trends for metrics such as call volume, duration, in-call silence, and agent or client emotions.

Whether or not your ASR system features a call analyzer, you will need to connect the ASR engine to your analytics or business intelligence application via an API. As long as your ASR engine is designed for ease of integration, this process should be straightforward.

A Complete Solution for Delivering Actionable Insights

While a call analyzer is useful, leading contact centres do more than review dashboards and metrics. The greatest opportunity lies in using call transcripts for predictive and prescriptive analytics.

These advanced analytics applications allow you to anticipate customer needs and recommend next best actions to drive better customer experiences, smarter business decisions, and significant cost savings.

Parts of this article were originally featured in Call Centre Helper’s article: A Checklist for Implementing… Speech Analytics

Author: Guest Author

Published On: 10th Apr 2021
Read more about - Industry Insights,

Follow Us on LinkedIn

Recommended Articles

computer showing analytics
Speech Analytics - What to Look for When Buying a Solution
Typical Uses For Speech Analytics
The Call Recording and Speech Analytics Reference Guide
Person holding mobile with holographic connections
AI-Powered Speech Analytics: 6 Considerations for Maximum Impact