Why Most Voice Automation Systems Are Falling Short Right Now

A consistent pattern has emerged in the evolution of voice automation: that while speech recognition and conversational interfaces have improved significantly, the underlying operating model for automation has remained largely unchanged.

To find out more about the challenges this is posing, we asked independent researcher and practitioner Ankit Talwar to put the spotlight on why most voice automation systems appear to be falling short right now and what should be happening instead.

The Issue Is Not the Maturity of Voice Technology; It Is the Architecture Around It

Ankit Talwar, Director of Product Management for AI at Dell Technologies — Ankit Talwar

Most systems still optimize for routing efficiency rather than resolution quality. This structural limitation becomes most visible in one of the most persistent challenges spanning contact centres and supply chains alike: Where Is My Order (WISMO).

Across retail and e-commerce, delivery-status enquiries consistently represent a significant share of inbound customer contacts and rise sharply during peak demand periods.

Industry research indicates that WISMO enquiries can account for 25–35% of total contact volume, with spikes exceeding 50% during seasonal peaks.^[1][2]

Despite decades of automation investment, these contacts remain resistant to meaningful containment. The issue is not the maturity of voice technology; it is the architecture around it. Routing-first automation models are the primary structural failure mode preventing progress.

When Automation Can Only Restate Status Without Supporting a Decision, Escalation Becomes Inevitable

WISMO is often treated as a simple informational request. Customers want to know when their order will arrive. In practice, this framing significantly underestimates the problem.

Definition: This article defines WISMO not as a call type, but as an operational signal of upstream data failure.

Customers are rarely calling to hear a date repeated. They are calling to decide whether the delivery still works for them and, if not, whether alternatives exist. When automation can only restate status without supporting a decision, escalation becomes inevitable.

Three structural forces drive cost and friction in WISMO interactions:

Recontact Rates – Handling costs compound quickly when customers recontact after unresolved interactions.^[3]
Agent Attrition – Repetition contributes to agent fatigue, reducing engagement and quality.
Containment Gaps – Automation resolves information, not decisions, forcing hand-offs at the most expensive point in the interaction.

Most Voice Automation Systems Fall Short on Explaining Why an Outcome Changed

Most voice automation systems rely on order-centric platforms to answer delivery questions. These systems perform well at answering when an order is expected to arrive.

Where they consistently fall short is explaining why an outcome changed and what can be done about it.

When deliveries stall, dates shift, or tracking information stops updating, the explanation typically resides outside the order record itself.

It lives within broader operational systems that reflect execution reality rather than planned intent. Order-centric automation can predict outcomes; it cannot support decisions.

To find out why the relationship between personalization and automation matters, read our article: Balance Automation and Personalization in CX

Without Access to Operational Context, Automation Cannot Determine Whether a Delivery Can Be Rescheduled, So This Escalates to Human Agents

The information customers need to make decisions resides in operational signals such as:

Fulfilment exceptions
Logistics constraints
Carrier events
Policy boundaries

Without access to this operational context, automation can only repeat a status. It cannot determine whether a delivery can be rescheduled or if a “hold-at-location” option exists.

As a result, customers escalate to human agents, who must reconstruct the same context manually. This reconstruction effort is where time, cost, and customer trust erode.

The Context-First Resolution Framework Shifts Automation Toward Decision Readiness

The Context-First Resolution Framework is a decision architecture that shifts automation away from routing efficiency and toward decision readiness.

Within this framework, Voice AI handles high-volume, time-sensitive enquiries, identifying intent and retrieving relevant operational context. Critically, Voice AI stops when a decision or trade-off is required.

The Result Is Not Faster Routing; It Is Better Decisions

At the centre of the framework is a Multimodal Context Engine. This functions as a persistent layer that preserves the intersection of customer intent and operational reality. Unlike traditional omnichannel set-ups that reset context at hand-off, this approach ensures continuity.

Industry research shows that organizations are increasingly redesigning service journeys so that context flows seamlessly between AI systems and human agents.^[5] The result is not faster routing; it is better decisions.

The Context-First Resolution Framework is a decision architecture that shifts automation away from routing efficiency and toward decision readiness.

When Escalation Occurs, the Agent Receives Decision-Ready Context – Rather Than Raw Data

Here is an illustrative decision pattern for delayed fulfilment; a delayed delivery caused by a missed pick-up:

Informational Phase – Voice AI identifies the order and explains the missed pick-up. If the customer accepts the new date, the interaction ends.
Decisional Phase – If the customer rejects the date, the interaction shifts.
Preserved Context – Because context is preserved, the system evaluates valid alternatives before the interaction reaches a human.

When escalation occurs, the agent receives decision-ready context rather than raw data. This reframes agent assistance from “retrieval work” to “decision execution.”

Organizations That Fail to Close the Context Gap Will See Automation Plateau at Routing – Rather Than Resolution

Gartner projects that conversational AI deployments could reduce agent labour costs by $80 billion by 2026.^[4] However, these gains will not come from deploying voice interfaces alone.

Organizations that fail to close the context gap – the loss of operational truth between systems and people – will see automation plateau at routing rather than resolution.

WISMO Is Not a Volume Problem to Be Deflected but a Coordination Challenge to Be Solved

As enterprise AI matures, the boundary between contact centre operations and supply chain execution will dissolve. WISMO sits precisely at that intersection.

The Context-First Resolution Framework treats WISMO not as a volume problem to be deflected but as a coordination challenge to be solved through unified intelligence. Together, Voice AI, Multimodal Context, and Human Insight transform WISMO from a volume problem into a solvable one.

Written by: Ankit Talwar, Director of Product Management for AI at Dell Technologies and a Distinguished Fellow of the Soft Computing Research Society (SCRS)

References
[1] Radial. WISMO: 10 Tips to Reduce These Customer Care Interactions.
[2] Descartes. WISMO Calls: How to Reduce Them and Boost Last-Mile Efficiency.
[3] Sorted. The Real Cost of the Retail WISMO Problem and How to Reduce It.
[4] Gartner. Gartner Predicts Conversational AI Will Reduce Contact Center Agent Labor Costs by $80 Billion by 2026.
[5] McKinsey & Company. The Contact Center Crossroads: Finding the Right Mix of Humans and AI.
[6] Salesforce. State of Service Report, 7th Edition.

For more information on contact centre automation and technology, read these articles next:

Author: Ankit Talwar
Reviewed by: Xander Freeman

Published On: 7th Apr 2026
Read more about - Technology, Ankit Talwar, Artificial Intelligence (AI), Automation, Call Routing, Conversational AI, Customer Experience (CX), Customer Service, Service Strategy, Speech Recognition, Technology Enablement Strategy, Technology Roadmap, Top Story