Building Production-Ready AI Agents: 5 Architectural Lessons from Rasa & Nebius

Last week, the Rasa team co-hosted a webinar with the Nebius Academy on a topic that is currently redefining the developer landscape: AI Agents. We saw a peak of nearly 100 engineers join us to discuss everything from Claude Code’s CLI architecture to the nuances of "headless" automation. Through the interactions and questions, one thing became clear: developers are moving past simple "chat" interfaces. They want to build systems that do things.

However, moving from a prompt to a production-ready agent is fraught with "side quests” ranging from hallucinations, token bloat, and security risks. Here are the core architectural learnings from our session and how we are thinking about them at Rasa.

1. The Anatomy of an Agent: Brains vs. Bodies

An agent is more than just a Large Language Model (LLM). During the webinar, we broke down the agentic stack into four critical components:

The Brain: The LLM (like Claude 3.5 Sonnet or GPT-4o). It handles intent and reasoning.
Knowledge Access: This is where RAG (Retrieval-Augmented Generation) and vector search come in, providing the agent with facts it wasn't trained on.
Tools: The "hands" of the agent. This includes the ability to read/write files, execute Bash commands, or fetch web pages.
Intent & Workflow Control: The "nervous system." This is where the agent decides how to sequence tasks.

The Pro Tip: Don't trust smaller models to orchestrate complex workflows. As we discussed in the Q&A, smaller models often "fake" completion, claiming they finished a 10-step task while skipping steps 4 through 9. For reliable orchestration, use a "Senior" model (like Sonnet or Opus) to plan, even if you use "Junior" models (like Haiku) for simple sub-tasks.

2. Mastering "Plan Mode"

One of the most powerful features of modern agentic workflows (like Claude Code) is Plan Mode.

Instead of asking an agent to "Fix the login bug," you enter a collaborative planning state. The agent proposes a technical path, asks clarifying questions about your architecture, and waits for your "green light" before touching a single line of code.

Why this matters for Rasa developers: Just as we advocate for "Conversation-Driven Development" (CDD) at Rasa, meaning to listen to your users and use those insights to improve your AI assistant, agentic development requires Plan-Driven Implementation. If you don't iterate on the plan, the agent will inevitably create "code debt" by duplicating helper functions or ignoring your established project architecture.

3. Security: The "Secret" Problem

A major theme of our session was security. When you give an agent access to a Bash tool, you are essentially giving a robot the keys to your house.

Secrets Management: Never expose your .env files. Just as you wouldn't commit secrets to Git, you shouldn't allow an agent to read them.
The MCP Risk: The Model Context Protocol (MCP) is a game-changer for tool-use, but be wary of third-party "Skills." Malicious actors are already creating fake skills designed to leak data.
Rule of Thumb: If you wouldn't run a random script from the internet on your machine, don't let your agent load a skill from an unverified source.

4. Context is Your Budget

In the webinar, we demonstrated how the context command reveals exactly how many tokens are being "eaten" by system prompts and tool definitions.

Avoid Token Bloat: Don’t wait for "auto-compaction." Use the compact command to summarize long conversations or clear to start fresh once a sub-task is done.
Stay Local: Use specific file tagging (like the @ symbol) to guide the agent. Telling an agent to "look at the whole repo" is a recipe for hallucinations and a massive API bill.

5. What’s Next: Headless vs. Conversational

The future of DevRel at Rasa is exploring the intersection of Digital Co-workers (agents you talk to) and Headless Automation (agents that work autonomously in the background).

We are taking these technical deep dives on the road. Our goal isn't just to talk about AI, it is to build a local community of engineers who are operationalizing these tools in the real world.

Join the Inner Circle

We are building hyper-local communities of AI engineers in cities like London to share these playbooks before they hit the mainstream.

Want the full technical slides from the Nebius webinar and a heads-up on our next hands-on session?

Join the Rasa Voice AI & Agents List - London Chapter

From Chatbots to Agents: Engineering Reliable AI Workflows

1. The Anatomy of an Agent: Brains vs. Bodies

2. Mastering "Plan Mode"

3. Security: The "Secret" Problem

4. Context is Your Budget

5. What’s Next: Headless vs. Conversational

Join the Inner Circle

Read more

Introducing Rasa University: A Hands-on Learning Path for Agent Engineers

How to Build an AI Chatbot: Step-by-Step Guide

Conversational UI Design: 13 Rules That Still Work

AI that adapts to your business, not the other way around

Build your next AI

agent with Rasa