Multi-Agents are Secretly Distributed Monoliths

An agent that feels like a single entity.

In part I of this blog series, we established a concrete definition of Orchestration and used as an example the "Ace" banking agent, which helped users find, buy, and finance a new car.

For end users, we want every conversation with Ace to just feel like 'talking to your bank', unburdened by the complexity of the organization behind the agent. Rasa's vision is that AI agents like Ace integrate every customer interaction into one ongoing conversation.

But enterprises are complex organizations. If we want Ace to cover the full breadth of what the bank has to offer, our agent will have to navigate many different departments. How do we scale Ace across the entire organization? And with many teams contributing, how do we achieve independence - the ability to ship iterations quickly - while still offering a single, coherent customer-facing experience?

When developers see this complexity - multiple domains, specialized functionality - the instinct is to reach for microservices. But microservices depend on one thing: enforceable API contracts. When your interface is natural language, those contracts don't exist.

microservices depend on enforceable API contracts. With natural language, those contracts don't exist.

A better way to address this complexity is to think of your sub-agents as dependencies. Treating them as independently running services gives you all the pain of distributed systems with none of the benefits.

Note: I can recommend these excellent posts: ‘how we built our multi-agent research system’ and ‘don’t build multi-agents’, which are related to the topic at hand, but don't specifically discuss multi-turn conversations, which is our main interest in this article.

Creating a Multi-Agent Using Microservices

The core idea: an 'orchestrator' agent acts as an entry point and delegates to specialized 'sub-agents'.

Users initially speak with the orchestrator, and when financing comes up, for example, the financing sub-agent takes over. Messages sent by the user are passed to the financing sub-agent for as long as it’s in control.

This creates three problems that make microservices unworkable. Let's look at each.

Problem 1: Routing is Hard

In the conversation below, Ace has searched the web and recommended a Hyundai Kona. Then the user asks: "how much are they"?

Which sub-agent should answer? The web search agent could find generic pricing information. But our design intends for the cars.com agent to handle this to show specific Hyundai Konas for sale near the user's home.

This routing decision needs to be explicitly defined somewhere. You need to specify:

Web search handles: general car information, reviews, comparisons
Cars.com handles: inventory searches, specific vehicle pricing, availability
Financing handles: loan terms, monthly payments, affordability

And here's where it gets tricky: LLM-powered agents are eager to please and will try to be helpful beyond their defined scope. The cars.com agent, given control, will happily say things like:

"What's your budget range? That will help me filter the results."
"Would you like to explore financing options?"

If you've decided by design that only the financing agent should discuss budgets and loans, you now need to explicitly define:

When to route to cars.com versus asking about financing
Constraints on the cars.com agent to NOT offer financing help
Constraints on the web search agent to NOT answer pricing questions

This coordination work has to happen somewhere - whether in the orchestrator, in the agent prompts themselves, or in some shared configuration. And every time you change a sub-agent's scope, you need to update these routing rules and constraints across multiple agents.

In traditional microservices, you enforce contracts at runtime with schemas. Here, you're trying to enforce them with a lot of prompts and a little prayer.

Problem 2: Changes Cascade

One of our design criteria is that teams should be able to ship independently.
The financing team wants to improve their flow. Currently, they ask: "Do you want a 36, 48, or 60 month loan?" Then they ask about down payment. They want to flip this: start by asking "What monthly payment can you afford?" and calculate the loan terms from there.

This should be a simple, isolated change. One team, one agent, one prompt update. Except now you have a problem.

The cars.com agent already asks users for their "budget" before searching - meaning the total price of the car. The financing agent now also asks about a budget - meaning the monthly payment they can afford. Same word, different meanings.

User conversations can evolve in multiple ways:

A user who starts by searching for a specific car, but after seeing the loan terms realizes it's outside their budget and starts looking for something more affordable
A user who knows exactly what they can afford monthly and wants to find the best car within that constraint
A user with a preference for a new car, but open to a used one if the price difference is too high

Each of these conversations will interleave web search, cars.com, and financing. The financing team's "simple change" now requires updates to the orchestrator to ensure it handles all these conversation paths correctly. You need integration tests to prevent regressions. And you need to deploy everything together.

That's not loose coupling. That's a distributed monolith.

Problem 3: Shared Mutable State

For sub-agents to collaborate as part of a single, fluent conversation, they need to share data:

The user's budget (monthly and/or total)
The selected car model
The price of the car

We need to define which agents can read each of these-the 3rd-party cars.com agent shouldn't see what goes into your financing estimates. But we also need to decide who can update these variables and what should happen when they change.

A user says "actually I think I could swing $50 more a month." That could mean buying a better car, or just paying off their loan faster. "Should the financing agent recalculate loan terms? Work backwards to a new total car budget? Should the cars.com agent re-run its search to show more expensive cars?

This coordination logic has to be defined somewhere-in your orchestrator, or in each sub-agent. Whatever you choose, you're managing shared mutable state across supposedly independent services - the opposite of what makes microservices work.

It’s a monolith, might as well structure the code that way

If you expect users to engage with multiple agents within one conversation, you have an orchestration problem to solve. Running these agents as separate services will not make that simpler.

Microservices work when you can enforce API contracts at runtime. With natural language interfaces, you cannot. Without enforceable contracts, you get all the complexity of distributed systems with none of the benefits.

Structuring your code as a monolith doesn't make the orchestration complexity disappear. You still need to explicitly define routing logic, sub-agent boundaries, and data-update rules. But a monolith gives you the tools to manage it: atomic refactoring, a single test pipeline, consistent deployment, and a single source of truth.

My recommendation is: think of your sub-agents as dependencies, not services. Write comprehensive end-to-end integration tests that cover important user journeys. Set up a CI/CD job that runs these before every deployment. Explicitly define and version your domain definitions, context-sharing logic, and conversation flows.

Think of your sub-agents as dependencies, not services.

This isn't an option when integrating third-party agents (like cars.com in our example). And that's where A2A comes in.

When To Use A2A

If one of your sub-agents is built by a third party, or it's a legacy project on a different tech stack, then your orchestrator will have to treat it as a black box. In that case, it makes sense to use A2A to standardize communication between your orchestrator and sub-agent.

Just remember that A2A is a communication protocol. You can't invite 100 people to a Slack workspace and expect them to operate as a company, and you can't wire up agents with A2A and expect them to work together. Providing a means of communication is only a tiny slice of the problem. You need to define who does what, and when.

You can't invite 100 people to a Slack workspace and expect them to operate as a company, and you can't wire up agents with A2A and expect them to work together

A2A Evolution

As A2A evolves, I hope it can start to address the sub-agent scoping problem more holistically. Currently, A2A has the concept of an agent card.

Right now it's a static file, broadcast from the sub-agent to the world. I'd love it if an orchestrator could provide additional guidelines on the scope the sub-agent should adhere to - especially on what NOT to do.

Expand the agent card spec to explicitly define which context variables a sub-agent can read, write, and create.

This would evolve A2A toward providing an orchestration layer, which may not be in line with the project’s vision. But these things are necessary if we want to create a standard for building orchestrated agents that can scale without turning into spaghetti code.

Next up

In part III, we’ll check out an orchestrated multi-agent built using Rasa 3.14 and see these ideas in action.

Attend our upcoming online event

As enterprises scale their assistants, they often face a tough reality: many critical capabilities live in third-party agents or legacy systems that can’t be refactored into one monolithic assistant. This is where Agent-to-Agent (A2A) comes in. Join us for Part 2 of the Rasa Orchestration Series: Agents Talking to Agents on Thursday, October 23, 2025 at 11:00am ET. Click here to register.

And watch the replay of our first online event in the series, Context Engineering for MCP