Call Centre Avatars

If the phrase ‘avatar’ means nothing to you – or little more than those two-dimensional icons you sometimes see in Instant Messaging tools – then read on. Dr Iain McKay is about to explain how the contact centre might benefit from a new breed of customer service representative: the virtual agent.
Self-service technologies are becoming increasingly important to the contact centre industry, providing automated handling of mundane or repetitive tasks through various media, and lifting human agents up the value chain. But some people are not comfortable with self-service, and some actions are not suited to leaving customers to fend for themselves.

Step forward avatars: computer-generated representations of a human, giving a sense of physical presence. Incorporating avatars in to the technology mix allows hybrid self-service or assisted interaction, with the ‘virtual agents’ providing a more enjoyable, familiar and reassuring experience.

Tips on avatar selection

Ensure it looks life-like – but not necessarily anthropomorphic. This is an important point. A Hollywood computer-generated imagery (CGI) animation house is not required to render an interactive avatar to the user’s desktop. While anthropomorphic reality conveys a sense of realism, it should not be at the expense of life-like animation. Valuable central processing unit (CPU) horsepower should not be wasted to render every hair follicle when lip-synching and mouth animation suffers.
Figure-hugging, shiny clothes are easier to animate than flowing cloth.
Put most effort in to the face. Right from birth, we humans focus on and remember each other’s faces. Greatest emphasis should therefore be placed on the eyes, nose and mouth.
Features can be exaggerated. Many subtleties are lost when ultimately displayed in a two-inch square on screen. Eyes and mouth merit enlargement.
Ensure it sounds life-like. There should be no discontinuity between the face and the voice emanating from it. Give the avatars a realistic accent from a local voice talent and, if using text-to-speech (TTS), select a matching voice. Consider paying for a custom TTS voice. Consider, also, using actors rather than telephony voice talents.
Ensure seamless animation, with plenty of ‘reactivity’. It is impossible for a human to carry out a thought process – other than a reflex action – without moving their eyes, so avatars should respond likewise. They should feature smooth continuous movement, no jerkiness and no latency. It is important, too, that they have ‘signs of life’ – that is, a heaving chest, intermittent coughs, bobs of the head – and don’t look like a lifeless marionette between responses. There should be seamless transitions between gestures, as well. In other words, one should be completed before another is started.
Use gestures and non-verbal cues to aid dialogue. Use an avatar’s body language to provide non-verbal cues, such as leaning towards the user (a turn-taking cue), turning away, cocking the head, cupping an ear (struggling to hear), playing with a pen (bored), screwing eyebrows (confused), and touching a chin (pensive).
Provide an empathetic bond. The avatar’s head should nod understandingly while the user is providing input. The avatar should look pleased to have accepted a valid input.
Use multimodality. Ideally, important quantitative data should be rendered and conveyed within the 3D model space – that is, on a visual display unit (VDU) screen next to the avatar – in order that the user does not break their mental model and immersive bond.

The term ‘avatar’ is increasingly used to refer to the static, two-dimensional (2D) icons used in internet-based Instant Messaging (IM) tools to provide a visual representation of background, mood and context. However, in our industry, avatars are animated characters with life-like qualities, appearing as life-size talking heads – for instance, on 6ft tall kiosks – or, more commonly, as small, helpful agents on a computer screen.

A bit of avatar history

One of the first avatars was the 80s TV DJ Max Headroom. Although Max was not actually computer-generated – he was played by an actor with the help of much latex, fibreglass and editing trickery – his characterisation was that of an avatar. Max caught the public’s attention, and that of the Hollywood executives, giving rise to movies such as “The Lawnmower Man” and later “The Matrix”.

After that, the seminal cyberpunk novel “Snow Crash” defined avatars as we know them today. Avatars, it said, “are the audiovisual bodies that people use to communicate with each other in the” virtual reality Internet”.

Around the same time, desktop computing power burgeoned, and many commentators considered it only a matter of years until every PC had virtual reality operating systems featuring avatars and natural language interfaces such as speech recognition and text entry. Companies like Superscape and Renderware meanwhile emerged to cater to the growing demand for 3D graphical computer environments.

As the Internet continued to grow in popularity, creating avatars and the virtual spaces they inhabit became a popular pastime. Unfortunately, it became clear that desktop PCs did not possess the processing power to realise the creative aspirations of the avatar pioneers, and the fervour inevitably cooled. On the upside, though, PC power has increased four-fold since those days, and commercial investment and confidence in avatar technology is beginning to re-emerge.

In April of 2000, the first – and, arguably, the most famous – avatar was launched on the Internet: a virtual newsreader called Ananova, whose whole character was defined. She was a single, 28-year-old, 5ft 8in tall ‘girl about town’ with a ‘pleasant, quietly intelligent manner’ and a love of the band Oasis and the cartoon The Simpsons. In her first year, Ananova Ltd was sold to Orange UK for 95million, reflecting her enormous overnight popularity.

How the avatar has changed to cater for today’s market

Unsurprisingly, given the unprecedented success of Ananova, there are growing numbers of animated avatars throughout the Internet, mostly for the one-way branded information delivery that can usually be found on media-savvy news, entertainment and, somewhat creepily, adult entertainment websites. Many avatars provide animated audio-visual feedback and ‘customisation’, but few are rendered in real-time.

Recently, however, the mobile content market has become an interesting growth area. This is not only due to the increase in central processing unit (CPU) power, increased bandwidth and reduced power consumption, but also the increase in memory in handheld devices.

Interestingly, the aforementioned Superscape rendering engine has been repurposed for mobile phones. What was possible only on a desktop PC in the early 90s is today possible in the palm of your hand.

Companies currently offer services such as ‘weemees’ and ‘YoMeGo’ – from the people behind the Ananova character – as customised characters for the mobile and desktop PC. Indeed, Digital Animation Groups’s latest high quality avatar for the mass market harks back to Ananova’s roots: a virtual Geoffrey Boycott character for the new English cricket season, appearing on the UK’s Channel 5 website and, harking back to Max Headroom, slated for its appearance on TV.

An important recent improvement related to the perceived avatar quality is that of text-to-speech (TTS) synthesis engines used to make avatars speak with lip synchronisation. Many localised accents are also available to suit worldwide markets.

In addition, the cost of creating a custom TTS voice is not now what it used to be, opening up the possibility of using an actor or voice talent to portray a personality more suited to the target demographic. Avatars have even been shown to take their audio feed from live input – like puppetry.

Today’s avatars come with a range of realistic gestures, such as waving, pointing, scratching the chin, coughing, drinking from a glass, waving pens, typing at keyboards, turning, nodding and shaking heads. They also boast signs of life such as blinking, subtle twitching and signs of breathing, as feedback that all is well and to demonstrate a readiness to accept input.

More subtle animations – and especially facial gestures – allow for the portrayal of mood and emotion, too. For example, an avatar’s smile can be programmed to change as it answers questions in the positive or the negative. What’s more, the latest developments in dialogue enhancements allow natural language input, and characterised output across a whole range of media, including short message service (SMS), text chat, IM, natural language interactive voice response (NL-IVR) and kiosks with voice over Internet protocol (VoIP).

But what about the contact centre?

For contact centres, avatar tasks are increasingly becoming those of fronting calls or contacts and handling the more common, expected, easily quantified and modelled tasks. The benefits are myriad. Avatars work 24 hours a day, provide a consistent brand and approach across all customers, and receive contacts across interactive voice response (IVR), VoIP, text chat, IM, SMS and multimedia messaging service (MMS). However, being automated, they are only as good as the modelling of the business process and natural language used to describe it. Therefore, there is still room for their human counterparts to handle exceptions, high value and high risk processes.

The benefits of using avatars as the front face of virtual agent technologies – such as speech recognition, natural language processing, process modelling, knowledge management, artificial intelligence and speech synthesis – is that they convey a human-like sense of presence, feeling and understanding which human users relate to. A further advantage over human agents is that individual customers can personalise the character of their preferred avatar-agent, and thus deal with the same, familiar ‘agent’ on each contact.

Meanwhile, technology is progressing to allow for the placement of animated, talking avatars on shop windows, by means of ‘rear projection’. Here, a thin film is placed behind the shop window and a projector presents a life-sized avatar through the window. It is early days for the technology, but it has been piloted by mobile phone operators, car showrooms and even clothing shops.

The next generation of system also allows for user input through the touch window, creating a virtual touch-screen around the avatar. Proximity sensors have also been used to make avatars ‘come to life’. Avatars may soon be finding gainful employment in store fronts in every high street as well as media companies utilising avatars – à la Max and Geoffrey – to render personalised micro-marketing content delivered to consumers’ homes through their PCs and TVs.

A glance in to the future of the virtual agent

What about avatars in the more ‘distant’ future? In truth, contact centre avatars may well routinely be encountered in ‘fully immersive’ virtual reality environments. Using somewhat cumbersome stereo goggles, headsets or shutter glasses, they already can be. However, scientists – at the Max Planck Institute for one – have already successfully managed the control of nerve tissue via ‘neural transistors’, which provide electronic control of nerve impulses by detecting and firing impulses in close proximity to neurons. This makes previously Sci-Fi based musings on ‘nano-bots’ – beings inhabiting the human body and managing the flow of impulses along neural pathways to fool the brain in to perceiving a new virtual reality – more a likelihood than a pipe-dream.

But to get back to the reality of the present day. What will be the impact on the next generation of contact centres?

Well, it’s not looking likely that human agents will be sporting coiled wires to the ceiling any time soon, plugging in their brains to the all-seeing super-computer. However, avatars are increasingly providing a ‘face’ for self-service technology which humans can relate to and empathise with. The next generation of case management and blended multi-channel contact centre software already allows an automated process to manage a case around a blended workpool of virtual and human agents.

There is also the capability today for what we at Graham Technology have been calling – rather tongue in cheek – ‘cyborg agents’: human agents whose prompts and candidate processes are suggested to them by the virtual agent, but where the human acts as the safety valve and final authority. This allows for much faster throughput, for the multi-tasking of many customers, and enables contact centres to concentrate on ‘go/no go’ decisions, conflict resolution and making qualitative judgements on the fly, for which humans are still the best qualified. After all, it’s unlikely that call centres will want to entrust a decision to ‘authorise compensation’ or ‘apply extra goodwill discount’ to a virtual agent just yet.

Interestingly, with the recent improvements in TTS synthesis engines and the lower cost of creating custom voices from archives of (annotated) speech samples, it is my opinion that there will be an emerging market for the creation of avatars with the face and voice of dearly-departed actors and celebrities whose image rights have been carefully protected and made licensable.

Just imagine the impact of Marilyn Monroe, Martin Luther King, John F Kennedy or Winston Churchill giving a public health warning or even a sales message”

Dr Iain McKay cut his teeth on avatars, 3D graphics, virtual reality, multi-user shared spaces and speech and language processing at the Centre for Communication Interface Research at the University of Edinburgh, where he worked as a researcher for ten years. While there, he built virtual reality inhabited worlds and avatars for delivery over the web and gained his PhD “The Design of Multimedia Services for the Home” along the way. For the past three years, he has been a research officer at Graham Technology, building virtual agents to front the next generation of self-service customer relationship management (CRM) processes.

Author: Jonty Pearce

Published On: 25th Jul 2006 - Last modified: 11th Dec 2017
Read more about - Technology, Avatars

1 Comment

I am currently working in a call centre, as an agent. So my job will become obselete, though I speak, read and write 4 languages. Not that I am worried sick, I would start to sound like those people in the car industry and would think that all manual labour would have been robotised by now… but I don’t. I am just wondering, if I cannot do any translating, or call centre, or even typing (voice input)… what I am going to do after the next 5-10 years have gone by. Someone suggested that we should turn around our system, that prescribes everyone to need a job to survive, but I don’t see that coming without any struggling, especially in a world where so many later developed economies still will proceed by our ‘ancient’ philosophy’. Currently private small investors are holding back because of insecurity on consistency. Later we might have a future in which we have to make substantial and constant moving towards our jobs. And than again, computers make it easier and give opportunity NOT having to move. But a world that is running on all the speeds that were previously locally triggered (relative to our standards, Muslims seem to be coming from the Middle Ages, Africans still stuck in the stone age (and are exploited by us and the Chinese, who are now in the first half of the 20th Century, … and so on.) Our global village is turning at different speeds, having set off at the point they were as they were still physically and psychologically seperated. It seemed to have the tendancy of redespersing what was claimed to be unified in a village.

Johan 2 May at 12:32

Call Centre Avatars

Recommended Articles