Rasa as A2A Agent Architecture
When a2a_server is configured in endpoints.yml, Rasa registers A2A JSON-RPC routes on the same Sanic port as REST and channel webhooks.
An external orchestrator discovers capabilities via AgentCard, sends user turns, and receives task lifecycle updates mapped from Rasa's dialogue state.
This page describes the sub-agent (server) architecture. For Rasa as an orchestrator that calls external A2A agents, see Integrating External Agents via A2A.
For configuration and operations, see A2A Server and Exposing Rasa as an A2A Sub-Agent.
Deployment topology
Shaded regions show ownership: the orchestrator (A2A client) discovers skills, sends and receives turns over JSON-RPC, and optionally hosts a push callback URL. Rasa (A2A server) exposes public endpoints, runs flows, and POSTs task updates when push notifications are enabled (dashed arrow).
The orchestrator owns end-user routing and multi-agent coordination. Rasa owns flow execution, slot collection, conversation repair, and NLG within its advertised skills.
The governed agent contract
Rasa exposes a bounded contract to orchestrators. Internal dialogue machinery stays private.
What Rasa exposes
| Surface | Description |
|---|---|
| AgentCard | Public capability advertisement. User flows become AgentSkill entries; pattern flows are included when include_conversation_repair: true. |
| A2A task states | working, input_required, completed, failed, canceled, rejected, auth_required — mapped from dialogue stack and tracker state after each turn. |
Structured DataPart | Terminal and interactive-terminal payloads with state, active_flow, current slots, persisted slot values, and error/cancel metadata when applicable. |
User-visible TextPart | Bot utterance in status.message so clients that only read text still receive NLG. |
message/stream artifacts | working status updates and token/chunk deltas during streaming custom actions. |
| Session continuity | One Rasa conversation per orchestrator contextId; contexts stay resumable across turns and after inactivity (with start_session_after_expiry: false). |
| Orchestrator auth | Optional bearer JWT on JSON-RPC and AgentCard routes (validates the orchestrator, not the end user). |
What Rasa keeps internal
| Internal | Why it stays private |
|---|---|
| LLM prompts and command-generator internals | Implementation detail; orchestrator sends user text, receives commands' effects as task state. |
Dialogue stack frames (pattern_*, flow stack) | Mapped to A2A states and summarized slots — raw stack is not exported. |
| Full tracker event history | Available only via separate REST/tracker APIs if configured; not part of the A2A task contract. |
| Channel connectors and REST webhook surface | Parallel ingress paths; orchestrator uses A2A JSON-RPC only. |
Sub-agent orchestration (sub_agents/) | Separate concern; not part of the sub-agent server contract. |
End-user identity and backend context belong in A2A message.metadata, DataPart slot pre-seeding, or orchestrator-side session management — not in the sub-agent JWT.
Context and task mapping
Blue: identifiers the orchestrator supplies on each A2A message. Purple: how Rasa maps and tracks them server-side.
| A2A concept | Rasa mapping | Behaviour |
|---|---|---|
contextId | sender_id | One persistent Rasa conversation per orchestrator context. Slots and flow progress carry over across turns. |
messageId | Dedup key (contextId, messageId) | Retries replay the cached terminal task. Concurrent duplicates while in flight return HTTP 409. |
task_id | New per orchestrator message | Each turn is a distinct A2A task. A completed task on a context does not block the next message — a new task starts. |
input_required | Context reservation | The context stays reserved until the orchestrator sends a follow-up or calls tasks/cancel. |
ConversationInactive and tracker inactivity do not release an input_required context.
With start_session_after_expiry: false (required for A2A), the next message on the same contextId resumes the flow without running action_session_start.
Task state lifecycle
After each message is processed, TaskStateMapper derives the A2A task state from the dialogue stack and latest action.
Evaluation follows this priority order:
auth_required— orchestrator bearer token missing or invalidrejected—CannotHandlePatternFlowStackFrameactive (outside jurisdiction)failed—InternalErrorPatternFlowStackFrameactivecanceled—CancelPatternFlowStackFrameactive or orchestratortasks/cancelcompleted— user-facing flow emittedFlowCompletedthis turn; no user flows remain on the stack; bot waits for user input (even whenpattern_completedis active ataction_listen)input_required—CompletedPatternFlowStackFrameactive; no user flows on stack;include_conversation_repairistrue; bot waits for user input; no user flow completed this turncompleted—CompletedPatternFlowStackFrameactive and no user flows on stack (including when conversation repair is disabled)input_required— last action waits for user input (collect step, greet, etc.)working— otherwise (processing or mid-turn streaming)
State reference
| State | Meaning for orchestrator | Typical next action |
|---|---|---|
working | Turn in progress; message/stream may emit intermediate updates | Wait for terminal state |
input_required | Flow needs more user input; DataPart includes active_flow and current slots | Show TextPart to user; send follow-up on same contextId |
completed | Business flow finished; DataPart includes persisted_slots | Read structured output; may start a new task on same contextId |
rejected | Request outside advertised skills (reason in DataPart) | Route to a different agent or handle in orchestrator |
failed | Internal error (error_type, error_info in DataPart) | Retry, escalate, or fall back |
canceled | Orchestrator or timeout cancelled the task | Clean up orchestrator-side state |
auth_required | Missing or invalid orchestrator JWT | Refresh token and retry |
When include_conversation_repair: false, conversation-repair stack frames map to completed instead of input_required.
Message processing path
Blue groups: orchestrator-side request, response, and optional push callback. Purple group: Rasa server processing from JSON-RPC ingress through CALM to task status and push transport.
On each turn:
- Optional bearer JWT is validated (orchestrator auth).
- Orchestrator slot pre-seeding is applied as
SetSlotCommands before dialogue processing. - Rasa executes flows via the standard CALM pipeline.
A2AOutputChannelmaps tracker state to A2A task updates, including structuredDataPartandTextPart.- Optional push notification POSTs fire on state transitions when enabled.
Scaling constraints (in v3.17)
Until persistent A2A task and message stores ship:
SANIC_WORKERS=1per replica — idempotency, cancel, andmax_contextsare per-worker.- Horizontal scaling — add replicas with load balancer sticky routing keyed on
contextIdso the same orchestrator context hits the same pod. A2A task caches, deduplication, in-flight queues, and push config are in-memory per replica until persistent stores ship. Derive the routing key from the JSON bodycontextId, theX-A2A-Context-Idheader, or ana2a-context-idcookie set by the ingress. See Multi-replica load balancing. - Hot model reload —
PUT /modeldoes not refresh the AgentCard or re-wire the A2A executor; restart after model deploys that change advertised skills.
Running Rasa as an A2A sub-agent while also invoking external sub-agents (sub_agents/ with protocol: a2a) is not tested for v3.17.