Does emotion detection really work?

There has been a lot of hype recently about emotion detection. Many people have been unsure of how well it works, or if it even exists!

We asked our panel of experts for their opinion.

The business view

According to Wikipedia the word ‘emotion’ means a subjective experience, associated with mood, temperament, personality and disposition. So surely the first question must be, is it even possible to determine a customer’s mood and emotion based solely on the sound of their voice, particularly when this is something which is open to so much interpretation?

Granted you are likely to detect ‘stress’ in a voice by setting parameters in areas like tone, pitch, pace, volume or lack of it, but getting to the root of the emotion behind that may prove more difficult, particularly if it is an automated software process rather than a human judgement call.

Keeping cool under pressure

Some customers are very cool, calm and collected when conversing although deep down they may be angry. They may be able to clearly outline within the call what will happen if their problem is not rectified and these calls may go undetected when applying emotion detection.

However, they would quite probably be picked up when employing phonetic indexing to search for key terminology traditionally used in an ‘unhappy customer’ scenario.

The real question is not “does emotion detection really work?” but “do the benefits associated with enabling emotion detection justify the costs?” And to that question the answer at the moment is “no”.

The prerequisite for enabling emotion detection is to use stereo (dual recording) which at the present cost/benefit trade-off may not be justified.

Emotion detection doesn’t necessarily give you all the answers

The other key point to make is that knowing whether someone is agitated or not is not enough, so what if you search across 1,000 calls and 73 of them reveal a level of emotion which is flagged? Further analysis is still required to find out why and what has led to that situation.

Similarly, there may be a number of false alarms, whereby a lot of “stress” is detected in a call, but the call outcome remains reasonably positive and was handled relatively well and does not really offer that much more insight into the business.

The lie detector?

In the early years it was touted as a possible tool to help combat fraud, particularly in the area of fraudulent insurance claims, but the reality is that it is not a replacement lie detector and when looking at the frequency range of the human voice, it is generally understood that electronic telephony can only handle around 20 percent of what is said.

This being said, a significant number of our customers still ask us “can you supply emotion detection?”, predominantly because it is perceived as “flashy and exciting” and often used descriptively when explaining the concept of speech analytics for the first time. The answer is yes, but we ask them why or how they want to use it, and invariably another tool can provide the same result without the expense of emotion detection.

To conclude, it’s here, it works but there are probably more useful methods which can be deployed to deliver better returns.

David Mason, Major Accounts Manager, Business Systems (UK) Ltd (bslgroup.com)

The scientific view

Emotion lies behind much of the richness of human life and is one of the main drivers behind our choices and decisions. Accordingly it is a frequent customer request that we should provide tools for “emotion recognition” by analogy with the “speech recognition” systems increasingly widely available.

Emotion is a fluid and rather slippery concept

Some people, through disability or upbringing, find it difficult or impossible to understand and/or express how they feel. Likewise, interpreting someone else’s feelings is an art rather than a science.

In order to avoid attributing hard categories, many researchers prefer to use a continuous space such as that in Figure 1 rather than discrete labels to describe emotion.

Figure 1 A two-dimensional representation of emotion, derived from [1]

The advantage of this representation is that it is possible to express as numbers the continuous scale from “mildly irritated” to “incandescent with rage” and also to capture the shades of grey between related pairs of emotions.

For many years, speech recognition developments have benefitted from the availability of common databases, allowing relatively easy performance comparisons among different approaches developed by different laboratories. It is only within the past year that the first such competitive evaluation has been undertaken in the field of “emotion recognition” [3]

Keith Ponting

The task of recognising emotions from audio alone is even more difficult than that of speech recognition. Much of our perception of emotions is visual: facial expressions and body language contribute as much as half the emotional information [2].

Therefore, the task of “emotion recognition” today is roughly where “speech recognition” was 20 years ago. Some applications are possible, but only in strictly limited areas or under highly controlled conditions. By acknowledging those limitations and adapting to them, some commercial applications may be feasible, but it is likely to be some considerable time before an effective system for general purpose interpretation of real emotions from audio is available.

Dr. Keith Ponting, principal consultant (Research) at Aurix

We don’t do emotion detection!

Instead we do emotional analysis. We start from the point of view that no machine will be able to outperform the human brain when it comes to detecting emotion. So attempting to write software to act like a human brain by creating a reference point for a given emotion and then analysing calls to identify something that is similar is not the way to get consistently reliable results.

There is little commercial value in detecting emotions

Research has shown that there is little commercial value in detecting strong and obvious emotions such as anger, great joy, etc. Firstly, compared to the overall volume of calls in a contact centre they seldom occur, and secondly, when they do occur their cause is known or easily identified, so the benefit in detecting them on their own is low.

Better to look for specific words and phrases

Of far greater benefit is detecting the dozens of subtle speech components that make up an emotion and analysing these and their changing relationships to each other as the call progresses. In parallel the triggers for these changes such as the use of specific words/phrases, the time of day or the profile of staff used can be identified. These can then be assessed against the desired call outcome to drive better results more efficiently in the future.

Another fundamental challenge of emotional analysis is that the same indicators in two different people can mean two different things. For example on similar campaigns an elderly person might speak slowly and softly while a male teenager may talk quickly and loudly. This does not mean that the teenager is angry or the elderly person content, they simply have different ways of expressing their emotions.

Jonathan Slobom

It is vital to assess the harmony between the caller’s and agent’s style of speech. This helps answer the question of whether you have the right agents with the right soft skills on the right campaigns, so that they can easily tune in to the caller’s style of speech and mirror their behaviour, making them feel at ease to achieve the desired outcome.

On a single call, if the indicators of emotion fall outside norms established by the algorithm and/ or there is a lack of harmony between caller and agent it can be escalated as it happens for immediate intervention by a more experienced member of staff. Equally, multiple instances would suggest that the campaign itself needs attention.

So to answer the question, emotion analysis most definitely works and its ability to achieve progressively more sophisticated analysis will only improve over time.

Jonathan Slobom, Sales and Marketing Manager, IT campus (www.itcampus.eu)

There is much debate in the contact centre industry, supported by many promissory statements, about the capabilities of emotion detection. The debate covers issues like how it will allow organisations to detect frustrated or angry callers and provide insight into a customer’s feelings about an organisation, product, or services and thus allow companies to develop early corrective measures to improve customer relationship management.

Jonathan Wax

All of these questions are about insight that is critical to businesses, but can the technology deliver?

I believe that the answer is a qualified yes, within reason. An interesting starting point is to ask your friends and colleagues to firstly define emotion, then create a list of emotions and then explain the differing ways that we express those emotions. Typically the answers create a complex set of answers and automatically classifying them accurately is a challenge.

The acoustic approach

The acoustic approach relies on measuring specific features of the audio, such as tone of voice, pitch or volume, intensity, rate of speech. The speech of a surprised speaker tends to be faster, louder and higher in pitch while that of a sad or depressed speaker tends to be slower, softer and lower in pitch. An angry caller may speak much faster, louder and will increase the pitch of stressed vowels.

To create a database of defined sentiments against which ‘live’ audio can finally be evaluated and thereby deliver sentiment analysis, each single-emotion example is pre-selected from a ‘pristine’ set of recordings, manually reviewed and annotated to identify the sentiment it represents. Even in this pristine environment less than 60 percent of single-emotion, noise-free utterances can be correctly classified.

In the real world the call centre suffers from background noise, network interference and background talking – all of which substantially erode this percentage even further. Also the quality of the audio can significantly impact on the ability to identify these features. Compression methods make it very difficult to detect some of the most commonly sought features – such as jitter, shimmer and glottal pulse – even further degrading the results from this form of sentiment measurement.

Blended emotions are difficult to classify

This is compounded by the fact that speakers often express blended emotions, such as both empathy and annoyance which are tremendously difficult to classify. Additionally, sentiment analysis is often incapable of adjusting for the varied ways different callers express the same emotions, for example, people from the North East or Scotland might be brusquer while callers from the South West tend to be more polite even when displeased. These limitations highlight its non-viability as a business analysis tool.

Figure 1: Example of the linguistic, structured-query approach to sentiment analysis

Practical applications in the contact centre

Sentiment analysis can be an invaluable tool for improving the quality of service. [4]
Traditionally, weekly agent quality reviews cover just a handful of calls providing incomplete information. Phonetic search can analyse 100 percent of calls and give a complete view of customer interaction with an organisation.

Jonathan Wax, VP EMEA, Nexidia (www.nexidia.com)

Measuring ‘emotion’ is challenging

Measuring ‘emotion’ is challenging because people have their own distinctive ways of communicating and there are many variables that affect their emotional state beyond the current conversation. In our experience, an effective measure of agitation can be created by detecting changes in the stress levels and speech tempo of the conversation. Higher levels of change in stress and tempo are normally associated with a higher level of agitation.

This measure of agitation is increasingly meaningful when it is observed across a significant body of calls. By looking at a larger number of calls the random variables smooth out and the agitation measure becomes more useful. For instance, in a full week’s worth of calls, the average agitation on the calls handled by certain agents, or about certain topics, will be consistently higher than other agents or topics. This analysis can provide valuable evidence about what factors are affecting the customer experience.

Using agitation to drive marketing feedback

Adam Walton

One of our customers used agitation as a part of the analysis of a major price change announcement. Since the price was being reduced, they were surprised to observe a significant group of calls showing high agitation in conversations where the topic of the price change was discussed. Further analysis showed that a significant number of their customers who had recently signed up to fixed price contracts were calling to cancel these contracts and were upset that they couldn’t now enjoy the benefits of the price change. The use of speech analytics enabled them to identify and quantify this issue, and be prepared for this with future announcements.

Measuring the nonverbal content of a call is not a ‘silver bullet’ but it does provide a valuable extra dimension for understanding customer experience when combined with the full range of information available from speech analytics.

Adam Walton, Vice President International , CallMiner (www.callminer.com)

Why does everyone claim to do emotion detection?

Emotion detection seems to be seen as a key element or differentiator of speech analytic solutions and for that reason was described to me once as “well everyone else says they can do it, so we have to say we can do it even though none of us really can”.

I’m sure that makes sense to a VP of Marketing somewhere. To my knowledge, and I am always willing to be corrected, anything that purports to be able to detect emotion in any currently available solution has to be defining emotion in a very simplistic way.

One reason why this has to be lies in the definition of emotion itself. Are we looking for “emotion” (which is what you feel) or “the expression of emotion” (which is what is displayed to other people)? We know that there are quite complex relationships between the core emotion and the expression of emotion, with variations being due to factors such as culture (different cultures have different “display rules”), situations (it’s OK to show some emotions in some situations, not in others), status differences (who the other person is), and the individual.

All in all, we know that there can be tremendous variability in the relationship between emotion and displayed emotion, whereas obviously automatic detection depends on being able to write a set of invariant rules.

People display emotion in different ways

Equally, another problem with detecting emotions is that no two people display them in quite the same way; when I get angry, I may shout, when you get angry, you might go quiet. But then again, when I’m happy I might shout as well so what does that tell us? Not a lot really.

Duncan White

One section of the manual analytics that we do for customers asks our raters, real human beings, to make judgements on the mood of the customer amongst other things and it’s a very skilled thing to do with any certainty. Making a judgement of someone’s mood and scoring it on a scale of -4 to +4 is a skill that takes some time to learn and there will still be differences of opinion as to the “right” score, if indeed there is one.

In my view, emotion detection was a solution to a problem that no one had and was dreamed up somewhere by someone as a ‘killer feature’. Surprisingly enough, problems can be identified and cross or happy customers can be detected through understanding the words and phrases that customers use when they are cross or happy.

Once again, you don’t have to over promise something that doesn’t exist, so why do vendors continue to persist?

Duncan White, Managing Director, horizon2

Author: Jo Robinson

Published On: 24th Feb 2010 - Last modified: 8th Sep 2025
Read more about - Archived Content, Duncan White

5 Comments

As someone who has evangelised speech analytics for a good few years now I remember emotion detection as being one of the first features out of the box.

The majority of group discussions ended with the same conclusion. Emotional response is a personal thing and can only be understood in individual context. Therefore searching for it as an indicator is not as strong as say the language people use which is a much firmer ground to start on.

There are just too many variables between cultures, generations, gender and simply how you feel on the day to produce anything close to a consistency that can be identified through analysis.

Given the fact that analytics is still new to everyone and might just ignite in 2010 beyond interest levels, emotion detection is an exotic aspect of analytics that seems to draw intial interest until people have a think about it.

Much more important is the prize of being able to trawl 100% of calls against a combined set of keywords or phrases. That is where the gold lies. And it is this that should be focussed on and promoted as the value of analytics in today’s market.

Martin Hill-Wilson 25 Feb at 11:36
The question posed is ‘does emotion detection really work’? This should not be confused with the question ‘is it possible to measure emotion, if so how can we understand what causes the feeling, does improving it make a difference, and if so, how do we improve’? Much of the discussion focuses on emotion detection through analytics and software technologies. These are services I have sold over the past years in the CIM arena, and my overwhelming view is ‘no – they don’t work’, that is not for emotional measurement with corresponding actionable data to improve successful emotional outcomes.

Analytics do have a function which is valuable (and I’m a fan of some vendor products), but have limitations with empathetic assessment to a micro level. For example, analysis of words and phrases can help to ascertain up/cross-sell or retention opportunities, or competitor analysis, but are not an accurate barometer of customers’ core warmth or coolness toward a brand. Interestingly, whilst investment in analytics, next big thing software and automated customer surveys has been proliferating, customer experience (as opposed to C-Sat which is a different metric) has remained static at a neutral, undifferentiated level, by and large.

The Empathy Rating model we use to assess interactions starts with a ‘gut feeling’, based on an initial conscious impression, then drills down to between 50 – 200 analysis points related to feelings, i.e. emotional needs, and ends with a lasting subconscious impression, which is the sentiment that ultimately drives loyalty (or not) and advocacy (or not). The actionable data produced can map directly into improvement programmes and has proven time and again to make a massive difference in positive emotional outcomes, with a direct correlation to profit.

Rob Sowden 25 Feb at 18:47
Hi, someone directed me here to see some interesting comments added earlier, but it seems they have now been removed. Is this Empathy debate being censored / restricted to certain ideas or services and if so why? Thanks

Bryan Foss 25 Feb at 19:46
Bryan

Our moderators remove all posts that breach our posting guidelines. These are usually posts that advertise a product or service.

This may be the case here – however it may be that you are looking at the wrong page as we deal here with emotion detection and not empathy.

jonty pearce 28 Feb at 10:05
According to UK statistic:
50% loss of performance because of stress
70% of visits to doctors triggered by stress
85% of serious illnesses triggered by stress
30% of turnover cases because of stress
£700m stress-related costs to UK employers every year
13m working days lost per annum according to UK
£7bn cost of stress to society per annum

Usage of real-time stress detection* within global contact centers demonstrates:
10% – increases in productivity
20% – life and job satisfaction
25% – reduced burnout
38% – reduced work stress
consequently
staff retention rate increase of up to 20%
a revenue growth of over 11%

*Agents received “mini-trainings” in stress management technique.

YOSTRESS 31 Jul at 10:47