Generative AI is an umbrella term that refers to any of the AI models that generate a novel output based on an input, often called a prompt. This broader term encompasses models that produce language, visual imagery, and audio.
You may have heard about Dall-e, another product of OpenAI, which can produce beautiful images when given a prompt. Or Jukebox that generates music as raw audio.
These generative AI models don’t necessarily use large language models (LLMs), but some do incorporate LLMs in an effort to understand the meaning of a prompt. For the contact centre, audio and visual models are less interesting at the moment.
However, models that produce audio outputs will surely hit their stride in the next few years, which will have a transformative impact on voice conversation.
Voice generation models take a small sample of recorded voice conversation and create a simulated voice that can be used by software systems programmatically.
Thanks to Talkdesk