2026 Conversational AI Predictions

Posted Dec 18, 2025

Updated

Lauren Goerz
Lauren Goerz

This year saw new models, leaps in orchestration, and deeper use cases for the agentic era. From OpenAI MCP support to the launch of frontier models like Gemini 3, it's been a pivotal year for AI development.

We saw some of our predictions come true (see below). A prejudice towards smaller, open source models in production over frontier models, due to cost, latency, and auditability.  As the no code movement reached some limits, such as edge cases and integration reality, we embraced the next new thing: vibe (AI-assisted) coding. It is at the heart of Hello Rasa, a playground where you can spin up a working Rasa agent in minutes.

As LLMs proliferate, so do new attack surfaces. Attacks like prompt injection, data exfiltration, and tool hijacking lay bare the security issues inherent in pure LLM agents.  As security firm Lakera determined in its recent report, a Rasa agent stayed within its scope, demonstrated high reliability, and exhibited behavior without sacrificing flexibility.

Meanwhile, multi-agent architectures will keep getting messier: A2A helps agents communicate, but it won’t prevent chaos without an orchestrator that routes work, manages state, and enforces guardrails.

We’re anticipating a hub-and-spoke shift towards generalist routers coordinating specialist agents. Agent patterns are far from settled, and both “LLM does everything” and rigid intent/entity bots hit limits.

The durable pattern is language for understanding, deterministic flows for decisions: CALM-shaped, even if named differently. In 2026, that split became best practice; Rasa standardizes it.

Our predictions for 2026

Vibe-coding speeds up project velocity

“Vibe-coding” is shorthand for something more practical: development flows where you talk to a system that generates real, reviewable code, explains architectural choices, and helps you navigate the framework without tab-hopping through docs. Think fast scaffolding, instant diffs, and a copilot that can justify constraints. It does not replace engineering discipline; it just speeds up the boring parts so teams can focus on design and evaluation. Hello, Hello Rasa!

Agent architectures stay messy

There is no universal agent architecture emerging. Retrieval-heavy stacks, support agents, and enterprise RPA systems each favor different topologies for the use case at hand. The real failures came from unclear boundaries and lack of conflict resolution between competing agents. The systems that will hold up will look more like well-governed distributed systems, with  explicit roles, deterministic overrides, and traceable reasoning paths.

Voice input and emotional markup start to converge

SSML (Speech Synthesis Markup Language) set the early standard for expressive speech, but vendors eventually splintered into their own incompatible tags. By 2026, better Voice Activity Detection (VAD) and growing open-source TTS datasets will push the ecosystem back toward convergence.  We predict that emotions represented in text input to TTS systems will become more consistent across platforms, making expressive voice easier to adopt, without relying on any single mandated standard.

Hybrid model stacks (SLMs + Big Models + Routing) are the future

Frontier models are powerful, but they are too expensive to run constantly and too slow for high-volume pipelines. The setup that will work best in practice will be a stack: small models handle routing and safety checks, mid-size models take on most domain-specific tasks, and the biggest models will only be called when their extra capability really matters. This shift is not philosophical; it is driven by latency limits, finite GPU slots, and real per-request cost curves.

Orchestration over A2A hype

The multi-agent hype runs into reality as soon as teams try to operate these systems. When agents do not have clear coordination rules, defined task ownership, or an orchestrator that can step in, they will drift, pursuing misaligned goals, looping on tasks, or producing conflicting actions. What emerges instead mirrors mature distributed-systems design: one conductor, explicit interfaces between components, deterministic fallback paths, and end-to-end observability so you can see what each agent is doing and why.

The Economic Reckoning: ROI becomes non-negotiable

2026 becomes the "hard hat year" where CFOs kill anything that can not be connected to unit economics (cost per resolution, time saved, revenue created). Many organizations have had years to test-and-learn. And while that will still provide value, executives will expect widespread deployments, actionable agents, and real meaningful results. Say goodbye to the concept of a "Cool demo" untethered to a P&L line item.

CALM-Style Architectures Become the Default for Serious Teams

In 2026 and beyond, Rasa is building the standard framework for agentic AI. There is ample evidence already that  both "LLM does everything" and old-school intent/entity bots cannot be trusted or will hit their limits. The separation of language (LLM reasoning) from logic (deterministic flows/policies) will continue to flourish as more providers follow a similar path we've blazed with CALM. Whatever it's called beyond our walls, CALM-style will become the sustainable pattern for regulated, complex use cases in 2026.

Using LLMs for understanding and Flows for decisions will become the industry's de facto best practice. In addition, a growing acceptance that their multi-agent architectures need an orchestrator, not communications protocol A2A to extract real value. And the entire world will understand that real value starts when conversation can be controlled, with an understanding grounded in language and decisions based on logic.

2025 Predictions review

We also made several predictions for what would happen in 2025, so it's only fair that we assess how we did.

The essential comeback of the UX designer Rating
Teams sought UX guidance, but the work shifted to system-level orchestration: when to generate text, when to call retrieval, and when to escalate. Many new entrants lacked real design experience. Rating

Multi-Interface is here to stay Rating
Users switch devices constantly. Context management remains hard: prompts absorb everything unless curated, irrelevant context can break behavior, and pruning/routing context is now core engineering work. Rating

Emerging standards for multi-modal experiences Rating
Speech remains the bottleneck: ASR struggles with accents, rapid speech, code-switching, and noisy environments. TTS prosody control is inconsistent, latency is non-trivial, and no unified workflow has emerged. Rating

So-called "game-changing techniques" are just prompt and pray Rating
Raw GPT wrappers and prompt hacks did not scale. Prompts are now treated as versioned, reviewable design assets, but adoption is uneven and enterprise teams sometimes over-trust untested templates. Rating

The word "Agent" loses its meaning Rating
"Agent" became overloaded. Teams now compensate with explicit labeling: tool-calling loops, retrieval-backed assistants, and timed executors. Semantic bleaching is complete. Rating

Smaller, open source models in production Rating
Small/mid-size models are widely deployed for cost, latency, and auditability. Frontier models are rare except for high-value edge cases. Adoption is broad but bounded by practical limits. Rating

RAG projects shift focus to data quality Rating
Data quality now dominates outcomes. Mature teams invest in preprocessing, chunking, retrieval tuning, and evaluation frameworks. Upstream mistakes destroy performance regardless of model choice. Rating

POCs happen twice as fast Rating
Tooling accelerates demos, compressing idea-to-interaction cycles. Quick POCs rarely reflect production robustness; scaling still separates quality. Rating

Job roles expand and blur Rating
Hybrid roles now exist (AI Product Designer, Interaction Architect). Designers handle retrieval and prompts, and engineers handle instructions and failure modes. Skill depth varies widely. Rating

User simulators become a standard evaluation tool Rating
Simulators appear in QA pipelines, but adoption is uneven. Many teams treat them as optional; coverage gaps limit reliability as a universal evaluation strategy. Rating

AI that adapts to your business, not the other way around

Build your next AI

agent with Rasa

Power every conversation with enterprise-grade tools that keep your teams in control.