Rasa as A2A Agent Architecture

When a2a_server is configured in endpoints.yml, Rasa registers A2A JSON-RPC routes on the same Sanic port as REST and channel webhooks. An external orchestrator discovers capabilities via AgentCard, sends user turns, and receives task lifecycle updates mapped from Rasa's dialogue state.

This page describes the sub-agent (server) architecture. For Rasa as an orchestrator that calls external A2A agents, see Integrating External Agents via A2A.

For configuration and operations, see A2A Server and Exposing Rasa as an A2A Sub-Agent.

Deployment topology

A2A sub-agent deployment topology

Shaded regions show ownership: the orchestrator (A2A client) discovers skills, sends and receives turns over JSON-RPC, and optionally hosts a push callback URL. Rasa (A2A server) exposes public endpoints, runs flows, and POSTs task updates when push notifications are enabled (dashed arrow).

The orchestrator owns end-user routing and multi-agent coordination. Rasa owns flow execution, slot collection, conversation repair, and NLG within its advertised skills.

The governed agent contract

Rasa exposes a bounded contract to orchestrators. Internal dialogue machinery stays private.

What Rasa exposes

Surface	Description
AgentCard	Public capability advertisement. User flows become `AgentSkill` entries; pattern flows are included when `include_conversation_repair: true`.
A2A task states	`working`, `input_required`, `completed`, `failed`, `canceled`, `rejected`, `auth_required` — mapped from dialogue stack and tracker state after each turn.
Structured `DataPart`	Terminal and interactive-terminal payloads with `state`, `active_flow`, current `slots`, persisted slot values, and error/cancel metadata when applicable.
User-visible `TextPart`	Bot utterance in `status.message` so clients that only read text still receive NLG.
`message/stream` artifacts	`working` status updates and token/chunk deltas during streaming custom actions.
Session continuity	One Rasa conversation per orchestrator `contextId`; contexts stay resumable across turns and after inactivity (with `start_session_after_expiry: false`).
Orchestrator auth	Optional bearer JWT on JSON-RPC and AgentCard routes (validates the orchestrator, not the end user).

What Rasa keeps internal

Internal	Why it stays private
LLM prompts and command-generator internals	Implementation detail; orchestrator sends user text, receives commands' effects as task state.
Dialogue stack frames (`pattern_*`, flow stack)	Mapped to A2A states and summarized slots — raw stack is not exported.
Full tracker event history	Available only via separate REST/tracker APIs if configured; not part of the A2A task contract.
Channel connectors and REST webhook surface	Parallel ingress paths; orchestrator uses A2A JSON-RPC only.
Sub-agent orchestration (`sub_agents/`)	Separate concern; not part of the sub-agent server contract.

End-user identity and backend context belong in A2A message.metadata, DataPart slot pre-seeding, or orchestrator-side session management — not in the sub-agent JWT.

Context and task mapping

A2A context and task mapping

Blue: identifiers the orchestrator supplies on each A2A message. Purple: how Rasa maps and tracks them server-side.

A2A concept	Rasa mapping	Behaviour
`contextId`	`sender_id`	One persistent Rasa conversation per orchestrator context. Slots and flow progress carry over across turns.
`messageId`	Dedup key `(contextId, messageId)`	Retries replay the cached terminal task. Concurrent duplicates while in flight return HTTP 409.
`task_id`	New per orchestrator message	Each turn is a distinct A2A task. A `completed` task on a context does not block the next message — a new task starts.
`input_required`	Context reservation	The context stays reserved until the orchestrator sends a follow-up or calls `tasks/cancel`.

ConversationInactive and tracker inactivity do not release an input_required context. With start_session_after_expiry: false (required for A2A), the next message on the same contextId resumes the flow without running action_session_start.

Task state lifecycle

After each message is processed, TaskStateMapper derives the A2A task state from the dialogue stack and latest action. Evaluation follows this priority order:

auth_required — orchestrator bearer token missing or invalid
rejected — CannotHandlePatternFlowStackFrame active (outside jurisdiction)
failed — InternalErrorPatternFlowStackFrame active
canceled — CancelPatternFlowStackFrame active or orchestrator tasks/cancel
completed — user-facing flow emitted FlowCompleted this turn; no user flows remain on the stack; bot waits for user input (even when pattern_completed is active at action_listen)
input_required — CompletedPatternFlowStackFrame active; no user flows on stack; include_conversation_repair is true; bot waits for user input; no user flow completed this turn
completed — CompletedPatternFlowStackFrame active and no user flows on stack (including when conversation repair is disabled)
input_required — last action waits for user input (collect step, greet, etc.)
working — otherwise (processing or mid-turn streaming)

A2A task state lifecycle

State reference

State	Meaning for orchestrator	Typical next action
`working`	Turn in progress; `message/stream` may emit intermediate updates	Wait for terminal state
`input_required`	Flow needs more user input; `DataPart` includes `active_flow` and current `slots`	Show `TextPart` to user; send follow-up on same `contextId`
`completed`	Business flow finished; `DataPart` includes `persisted_slots`	Read structured output; may start a new task on same `contextId`
`rejected`	Request outside advertised skills (`reason` in `DataPart`)	Route to a different agent or handle in orchestrator
`failed`	Internal error (`error_type`, `error_info` in `DataPart`)	Retry, escalate, or fall back
`canceled`	Orchestrator or timeout cancelled the task	Clean up orchestrator-side state
`auth_required`	Missing or invalid orchestrator JWT	Refresh token and retry

When include_conversation_repair: false, conversation-repair stack frames map to completed instead of input_required.

Message processing path

A2A message processing path

Blue groups: orchestrator-side request, response, and optional push callback. Purple group: Rasa server processing from JSON-RPC ingress through CALM to task status and push transport.

On each turn:

Optional bearer JWT is validated (orchestrator auth).
Orchestrator slot pre-seeding is applied as SetSlotCommands before dialogue processing.
Rasa executes flows via the standard CALM pipeline.
A2AOutputChannel maps tracker state to A2A task updates, including structured DataPart and TextPart.
Optional push notification POSTs fire on state transitions when enabled.

Scaling constraints (in v3.17)

Until persistent A2A task and message stores ship:

SANIC_WORKERS=1 per replica — idempotency, cancel, and max_contexts are per-worker.
Horizontal scaling — add replicas with load balancer sticky routing keyed on contextId so the same orchestrator context hits the same pod. A2A task caches, deduplication, in-flight queues, and push config are in-memory per replica until persistent stores ship. Derive the routing key from the JSON body contextId, the X-A2A-Context-Id header, or an a2a-context-id cookie set by the ingress. See Multi-replica load balancing.
Hot model reload — PUT /model does not refresh the AgentCard or re-wire the A2A executor; restart after model deploys that change advertised skills.

Not tested for 3.17

Running Rasa as an A2A sub-agent while also invoking external sub-agents (sub_agents/ with protocol: a2a) is not tested for v3.17.

Deployment topology​

The governed agent contract​

What Rasa exposes​

What Rasa keeps internal​

Context and task mapping​

Task state lifecycle​

State reference​

Message processing path​

Scaling constraints (in v3.17)​