This year saw new models, leaps in orchestration, and deeper use cases for the agentic era. From OpenAI MCP support to the launch of frontier models like Gemini 3, it's been a pivotal year for AI development.
We saw some of our predictions come true (see below). A prejudice towards smaller, open source models in production over frontier models, due to cost, latency, and auditability. As the no code movement reached some limits, such as edge cases and integration reality, we embraced the next new thing: vibe (AI-assisted) coding. It is at the heart of Hello Rasa, a playground where you can spin up a working Rasa agent in minutes.
As LLMs proliferate, so do new attack surfaces. Attacks like prompt injection, data exfiltration, and tool hijacking lay bare the security issues inherent in pure LLM agents. As security firm Lakera determined in its recent report, a Rasa agent stayed within its scope, demonstrated high reliability, and exhibited behavior without sacrificing flexibility.
Meanwhile, multi-agent architectures will keep getting messier: A2A helps agents communicate, but it won’t prevent chaos without an orchestrator that routes work, manages state, and enforces guardrails.
We’re anticipating a hub-and-spoke shift towards generalist routers coordinating specialist agents. Agent patterns are far from settled, and both “LLM does everything” and rigid intent/entity bots hit limits.
The durable pattern is language for understanding, deterministic flows for decisions: CALM-shaped, even if named differently. In 2026, that split became best practice; Rasa standardizes it.
Our predictions for 2026
Vibe-coding speeds up project velocity
“Vibe-coding” is shorthand for something more practical: development flows where you talk to a system that generates real, reviewable code, explains architectural choices, and helps you navigate the framework without tab-hopping through docs. Think fast scaffolding, instant diffs, and a copilot that can justify constraints. It does not replace engineering discipline; it just speeds up the boring parts so teams can focus on design and evaluation. Hello, Hello Rasa!
Agent architectures stay messy
There is no universal agent architecture emerging. Retrieval-heavy stacks, support agents, and enterprise RPA systems each favor different topologies for the use case at hand. The real failures came from unclear boundaries and lack of conflict resolution between competing agents. The systems that will hold up will look more like well-governed distributed systems, with explicit roles, deterministic overrides, and traceable reasoning paths.
Voice input and emotional markup start to converge
SSML (Speech Synthesis Markup Language) set the early standard for expressive speech, but vendors eventually splintered into their own incompatible tags. By 2026, better Voice Activity Detection (VAD) and growing open-source TTS datasets will push the ecosystem back toward convergence. We predict that emotions represented in text input to TTS systems will become more consistent across platforms, making expressive voice easier to adopt, without relying on any single mandated standard.
Hybrid model stacks (SLMs + Big Models + Routing) are the future
Frontier models are powerful, but they are too expensive to run constantly and too slow for high-volume pipelines. The setup that will work best in practice will be a stack: small models handle routing and safety checks, mid-size models take on most domain-specific tasks, and the biggest models will only be called when their extra capability really matters. This shift is not philosophical; it is driven by latency limits, finite GPU slots, and real per-request cost curves.
Orchestration over A2A hype
The multi-agent hype runs into reality as soon as teams try to operate these systems. When agents do not have clear coordination rules, defined task ownership, or an orchestrator that can step in, they will drift, pursuing misaligned goals, looping on tasks, or producing conflicting actions. What emerges instead mirrors mature distributed-systems design: one conductor, explicit interfaces between components, deterministic fallback paths, and end-to-end observability so you can see what each agent is doing and why.
The Economic Reckoning: ROI becomes non-negotiable
2026 becomes the "hard hat year" where CFOs kill anything that can not be connected to unit economics (cost per resolution, time saved, revenue created). Many organizations have had years to test-and-learn. And while that will still provide value, executives will expect widespread deployments, actionable agents, and real meaningful results. Say goodbye to the concept of a "Cool demo" untethered to a P&L line item.
CALM-Style Architectures Become the Default for Serious Teams
In 2026 and beyond, Rasa is building the standard framework for agentic AI. There is ample evidence already that both "LLM does everything" and old-school intent/entity bots cannot be trusted or will hit their limits. The separation of language (LLM reasoning) from logic (deterministic flows/policies) will continue to flourish as more providers follow a similar path we've blazed with CALM. Whatever it's called beyond our walls, CALM-style will become the sustainable pattern for regulated, complex use cases in 2026.
Using LLMs for understanding and Flows for decisions will become the industry's de facto best practice. In addition, a growing acceptance that their multi-agent architectures need an orchestrator, not communications protocol A2A to extract real value. And the entire world will understand that real value starts when conversation can be controlled, with an understanding grounded in language and decisions based on logic.
2025 Predictions review
We also made several predictions for what would happen in 2025, so it's only fair that we assess how we did.






