A2A Server

New in Rasa Pro 3.17

Rasa can expose your assistant as a native Agent-to-Agent (A2A) sub-agent.

When an a2a_server block is present in endpoints.yml, rasa run registers A2A JSON-RPC routes on the same Sanic port as REST and channel webhooks. External orchestrators discover your assistant's capabilities via AgentCard, send turns over JSON-RPC, and receive A2A task lifecycle updates mapped from Rasa's dialogue state.

This page documents the sub-agent (server) side. For Rasa as an orchestrator that calls external A2A agents, see External Sub Agent and Integrating External Agents via A2A.

Basic Configuration

Add a2a_server to your endpoints.yml file. Only description is required; other fields have sensible defaults.

a2a_server:
  url: "http://localhost:5005"          # optional; inferred from --port / --interface / --ssl-certificate
  description: "My banking sub-agent"
  include_conversation_repair: true     # false → terminal completed instead of repair input_required
  task_timeout_seconds: 600             # per-task safety net; 0 disables
  a2a_message_cache_ttl_seconds: 600    # messageId replay TTL; defaults to task timeout
  max_contexts: 1000                    # in-memory session cap; 0 disables
  # agent_card_path: ./static-card.json # optional static AgentCard; url still resolved at startup
  # auth: …                             # optional JWT bearer auth

Configuration Reference

Key	Type	Default	Required	Description
`description`	string	—	yes	Public description for the AgentCard. Used for auto-generated and placeholder cards. Ignored when `agent_card_path` points to a static AgentCard JSON file.
`url`	string	inferred from `--port`, `--interface`, and `--ssl-certificate`	no	Public base URL of the Rasa server as seen by A2A orchestrators, advertised in the AgentCard (scheme, host, port, path prefix). Set explicitly when the public hostname or port differs from the local bind address (for example behind a reverse proxy or ingress).
`agent_card_path`	string	none (auto-generate from flows)	no	Path to a static AgentCard JSON file. When set, the file is loaded at startup and its `url` is overwritten with the resolved public URL.
`include_conversation_repair`	boolean	`true`	no	When `true`, pattern flows (for example `pattern_completed`) are advertised as skills and map to `input_required` so orchestrators know Rasa handles conversation repair. When `false`, the same stack state maps to `completed` instead.
`task_timeout_seconds`	integer	`600`	no	Maximum seconds each in-flight `task_id` may run before the server auto-cancels that task (safety net; use explicit `tasks/cancel` as the primary path). Timers are per task and are not extended by later `message/send` on the same `contextId`. Set to `0` to disable.
`a2a_message_cache_ttl_seconds`	integer	same as `task_timeout_seconds`	no	TTL in seconds for `messageId` deduplication cache entries keyed by `(contextId, messageId)`. Set to `0` for immediate expiry (effectively disables replay).
`max_contexts`	integer	`1000`	no	Memory guard: maximum distinct `contextId` sessions retained until a terminal task outcome. `input_required` keeps the session reserved. Set to `0` to disable the cap.
`push_notifications_enabled`	boolean	`false`	no	When `true`, the AgentCard advertises push notifications and the server may POST task updates to orchestrator-supplied callback URLs. Disabled by default because callback URLs are client-controlled and can be an SSRF vector.
`push_notification_allowed_hosts`	list of strings	none	no	Optional hostname allowlist for push callback URLs. When set, only `http`/`https` URLs whose host matches an entry (exact or subdomain) are accepted. Loopback and private-network targets are always rejected.
`auth`	object	none (open endpoint)	no	Bearer JWT authentication for orchestrators calling the A2A endpoint. See Authentication.

Prerequisites

Rasa enforces two startup guardrails when a2a_server is configured. Both fail fast with actionable error messages.

Session configuration

When A2A is enabled, domain.session_config.start_session_after_expiry must be false.

Resumed orchestrator contextId values reuse the same Rasa sender_id. If start_session_after_expiry is true, Rasa runs action_session_start after inactivity and can silently reset slot state when the orchestrator resumes the same context.

This is validated when a model is loaded with a2a_server in endpoints.yml (for example during rasa run via load_agent). The error code is validation.a2a_server.incompatible_session_config.

domain.yml
session_config:
  session_expiration_time: 60
  start_session_after_expiry: false  # required when a2a_server is enabled

ConversationInactive does not release an input_required A2A context. With start_session_after_expiry: false, the next message on the same contextId continues the flow after inactivity.

See Session Timer for full session configuration details.

Sanic workers

SANIC_WORKERS=1 is required until a persistent solution for storing A2A tasks and messages ships. Multiple Sanic workers break messageId idempotency, in-flight HTTP 409 handling, orchestrator cancel, and max_contexts enforcement because these are per-worker only.

This is validated when starting the Sanic server (rasa run). The error code is validation.a2a_server.incompatible_sanic_workers.

Scale horizontally with additional replicas and load balancer sticky routing by contextId instead of increasing Sanic workers per pod.

SANIC_WORKERS=1 rasa run -m models/your-model.tar.gz --endpoints endpoints.yml

Multi-replica load balancing

When a2a_server is enabled and you run more than one Rasa replica, configure your ingress or load balancer so all A2A JSON-RPC traffic for a given contextId routes to the same pod.

A2A V1 keeps task state, messageId deduplication caches, in-flight turn queues, push notification callback registration, and max_contexts enforcement in memory on each replica. The shared tracker store (for example PostgreSQL) persists dialogue history, but A2A-specific state is not yet replicated across pods. Without sticky routing, follow-up turns on the same contextId may land on a different pod — the session can appear fresh, idempotency breaks, tasks/cancel may miss in-flight work, and push callbacks registered on one pod are invisible to others.

Requirements for multi-replica A2A:

Requirement	Why
`SANIC_WORKERS=1` on every replica	A2A state is per-worker as well as per-pod.
Sticky routing keyed on `contextId`	Keeps in-memory A2A state coherent for each orchestrator context.
Orchestrator reuses `contextId` across turns	Already required for multi-step flows; stickiness depends on a stable key.

When routing is configured correctly, orchestrator traffic for a given contextId (or connection) consistently hits the same pod — in-memory stores, queues, deduplication, and push config all stay coherent.

Deriving the routing key

Your load balancer must resolve a sticky key from contextId on every A2A POST / request. Use at least one of these sources (in order of typical precedence):

Source	Used for
`contextId` in the JSON-RPC body	`message/send`, `message/stream`, and other methods that include `contextId` in the payload
`X-A2A-Context-Id` request header	`tasks/cancel`, `tasks/get`, and other methods where `contextId` is not in the body
`a2a-context-id` cookie	HTTP clients that received the cookie from a prior response (for example browser-based orchestrators)

Configure consistent hashing or equivalent session affinity on the resolved key — not round-robin alone.

Istio on Kubernetes

Apply two Istio resources when Rasa runs behind an Istio ingress with multiple replicas. Istio consistentHash accepts one hash key per DestinationRule (header, cookie, source IP, or query parameter) — use a gateway EnvoyFilter to normalize contextId into a single internal header.

1. DestinationRule — consistent-hash on that header for the Rasa Service:

apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: rasa-a2a-sticky
  namespace: YOUR_NAMESPACE
spec:
  host: rasa.YOUR_NAMESPACE.svc.cluster.local
  trafficPolicy:
    loadBalancer:
      consistentHash:
        httpHeaderName: x-a2a-context-id

2. EnvoyFilter — on the ingress gateway (namespace: istio-system, selector istio: ingressgateway). Insert a Lua HTTP filter before envoy.filters.http.router that:

Applies only to POST requests on your assistant hostname (for example assistant.example.com).
Resolves contextId in order: "contextId" in the JSON body → X-A2A-Context-Id request header → a2a-context-id cookie.
Sets request header x-a2a-context-id to the resolved value (the DestinationRule hash key).
Stores the value in Envoy dynamic metadata during the request phase.
On response, emits Set-Cookie: a2a-context-id=<contextId>; Path=/; Max-Age=3600; HttpOnly; SameSite=Lax so methods like tasks/cancel that omit body contextId still route to the pod that handled prior turns.

Apply both resources with kubectl apply. Adjust spec.host and the hostname guard in the Lua filter to match your release name, namespace, and ingress host.

Verify: With at least two replicas and SANIC_WORKERS=1, send two message/send turns with the same contextId — the second should continue the dialogue, not restart it. Confirm in pod logs that each contextId appears on one pod only. Exercise tasks/cancel with X-A2A-Context-Id or cookie persistence (curl -c / -b).

Other ingress and load balancers

On NGINX Ingress, AWS ALB, HAProxy, or other gateways, implement equivalent behavior:

Inspect POST / JSON bodies for "contextId" where present.
Accept X-A2A-Context-Id as a fallback for methods that only pass task_id.
Optionally set and honor an a2a-context-id cookie for clients that do not send contextId on every request.
Hash-route (or pin sessions) on that key to a single backend pod.

Orchestrator and HTTP client guidance

Reuse the same contextId across turns in a multi-step flow.
For tasks/cancel and similar calls that only pass task_id in the body, send X-A2A-Context-Id: <contextId> or rely on cookie persistence through your ingress.
HTTP test clients should preserve cookies (curl -c / -b) when exercising sticky ingress.

HTTP Surface

A2A routes are registered at the root of the Rasa server (no URL prefix).

Route	Purpose
`POST /`	A2A JSON-RPC (`message/send`, `message/stream`, `tasks/cancel`, `tasks/get`, …)
`GET /.well-known/agent-card.json`	Public `AgentCard` (flow-generated at startup, or static file from `agent_card_path`)
`GET /.well-known/agent.json`	Deprecated alias of the AgentCard

Context and task mapping

contextId in A2A requests maps to Rasa sender_id. One persistent Rasa conversation exists per orchestrator context.
Each orchestrator message gets a new task_id.
input_required keeps the context reserved for follow-up turns on the same contextId.

AgentCard Generation

Orchestrators discover your assistant's capabilities through the public AgentCard.

Auto-generated (default)

When agent_card_path is omitted, Rasa auto-generates the AgentCard from user-facing flows at server startup (after the model loads):

Each user flow becomes an AgentSkill with id, name, and description (from the flow description or readable name).
Pattern/system flows (pattern_*) are included only when include_conversation_repair: true (the default).
name is taken from assistant_id in config.yml, or defaults to "Rasa Agent".
version is derived from the loaded model filename (without .tar.gz suffix), or from model_id, or defaults to "1.0.0".
capabilities.streaming is always true.
capabilities.push_notifications reflects push_notifications_enabled.

Static override

Set agent_card_path to load a static AgentCard JSON file. Rasa still overwrites the url field at startup with the resolved public URL from url or from --port / --interface / --ssl-certificate. The description field in endpoints.yml is ignored when a static card is used.

Refresh timing

At rasa run startup:

Before the model loads, a placeholder AgentCard is served (empty skills array).
After the model loads, the flow-generated or static AgentCard replaces it.

Hot reload via PUT /model does not refresh the AgentCard or re-wire the A2A executor. Restart the server after deploying a new model to update advertised skills and version.

Task States and Structured Output

Rasa maps dialogue stack and tracker state to A2A task states: working, input_required, completed, failed, canceled, rejected, and auth_required.

Terminal and interactive-terminal updates include:

A DataPart with structured state: state, active_flow, current slots, and (when applicable) persisted slot values from flows that declare persisted_slots.
A TextPart carrying the user-visible bot utterance when present, so clients that only read status.message still receive NLG.

message/stream emits working status and artifact deltas during token/chunk streaming custom actions. Blocking message/send returns the final task.

Orchestrator operations

tasks/cancel — idle cancel on an input_required context, or signal in-flight work; publishes canceled and records FlowCancelled when a user flow was active.
messageId deduplication — retries with the same (contextId, messageId) replay the cached terminal task without re-running the turn. A concurrent duplicate while in flight returns HTTP 409 with a JSON-RPC error (retry after the first completes).

Orchestrator Slot Pre-seeding

On every message/send / message/stream turn, Rasa reads slot context from the incoming A2A Message, applies it as SetSlotCommands before dialogue processing, and strips slot markers from the user text so they are not passed to the LLM.

Orchestrators can supply slots in any of these shapes. Later sources override earlier ones on key conflicts within the payload:

Text transcript (lowest precedence) — append to the text part:
- SLOTS: {"slot_name": "value", ...} on its own line, or
- a trailing fenced block: ```json\n{"slot_name": "value"}\n```
message.metadata — {"slots": {"slot_name": "value", ...}} (merged with JSON-RPC request metadata).
DataPart (highest precedence) — {"kind": "data", "data": {"slots": {"slot_name": "value", ...}}}.

Only slots defined in the assistant domain are written; unknown, builtin, and agent-internal slots are ignored. Values are coerced to the slot type; invalid or out-of-range values (for example categorical mismatches) are skipped. String values "none", "null", and "undefined" are stored as null.

Per-turn precedence: if command generation extracts a SetSlotCommand for a slot from the current user message, that value wins over orchestrator metadata for the same slot on that turn. Orchestrator values still pre-fill slots the parser did not set.

Send an orchestrator slot snapshot on each follow-up message for slots the user did not answer in that turn. Do not rely on user-visible text alone to carry slot values when a collect step is active — prefer metadata.slots or a DataPart, or keep slot JSON out of text the user is answering with.

Authentication

When a2a_server.auth is set, every protected request must include Authorization: Bearer <jwt>. Missing or invalid tokens receive HTTP 401 with a WWW-Authenticate: Bearer response header before message processing starts.

This applies to the JSON-RPC endpoint (POST /) and AgentCard discovery (GET /.well-known/agent-card.json, including the deprecated alias). If auth is omitted, behaviour is unchanged and no token is required.

Only type: bearer is supported (API key auth is not available). Auth validates the orchestrator only; end-user identity belongs in A2A message metadata or slot pre-population, not in this JWT.

JWT configuration

Key	Type	Default	Description
`auth.type`	string	—	Must be `bearer`.
`auth.jwt.algorithm`	string	`RS256`	JWT signing algorithm. Supported: `HS256`, `HS512`, `RS256`, `RS512`, `ES256`, `ES512`, `PS256`.
`auth.jwt.public_key_path`	string	none	Path to a PEM public key file. Required for asymmetric algorithms (`RS`, `ES`, `PS*`).
`auth.jwt.secret`	string	none	Shared secret for symmetric algorithms (`HS256`, `HS512`). Must use environment variable interpolation, for example `'${JWT_SECRET}'`. Plaintext secrets are rejected at startup.
`auth.jwt.issuer`	string	none	If set, reject tokens whose `iss` claim does not match.
`auth.jwt.audience`	string	none	If set, reject tokens whose `aud` claim does not match.

Production (RS256)
Local dev (HS256)

a2a_server:
  description: "My domain agent"
  auth:
    type: bearer
    jwt:
      algorithm: RS256
      public_key_path: "/run/secrets/jwt_public_key.pem"
      issuer: "https://auth.example.com"
      audience: "rasa-a2a-agent"

a2a_server:
  description: "My domain agent"
  auth:
    type: bearer
    jwt:
      algorithm: HS256
      secret: "${JWT_SECRET}"

Orchestrators and local test clients must send the same bearer token when fetching the AgentCard and when calling message/send, message/stream, or other A2A JSON-RPC methods.

Push Notifications

Push notifications are disabled by default (push_notifications_enabled: false) because callback URLs are client-controlled and can be an SSRF vector when the A2A endpoint is reachable by untrusted callers.

Enabling push notifications

Set push_notifications_enabled: true in endpoints.yml to advertise the capability in the AgentCard and allow orchestrators to register callback URLs:

a2a_server:
  description: "My banking sub-agent"
  push_notifications_enabled: true
  # Optional: restrict callbacks to known orchestrator hosts
  push_notification_allowed_hosts:
    - "orchestrator.example.com"
    - "hooks.partner.example.com"

When push notifications are enabled:

An orchestrator supplies a pushNotificationConfig.url on message/send, message/stream, or via tasks/pushNotificationConfig/set.
Rasa POSTs task state updates to that callback as the task progresses.
message/stream emits intermediate states (submitted, working) as well as the terminal state.
The public AgentCard advertises the push_notifications capability.
Rasa also POSTs the terminal canceled task when a callback URL is registered, because the a2a-sdk supplied on_cancel_task does not send push notifications.

Redirects are disabled

Rasa disables HTTP redirect following on every outbound push notification POST. URL policy is enforced only on the registered callback URL — redirect targets are not re-validated.

What happens today: if the callback responds with a 3xx redirect (for example 302 with a Location header), httpx does not follow it. Rasa treats the delivery as failed (non-2xx response), logs the error, and does not retry. The A2A task itself still completes normally — a failed push does not change the task outcome returned to the orchestrator over JSON-RPC or SSE. Only a single POST is sent to the registered URL; no second request is made to the redirect target.

Why redirects are disabled: if Rasa followed redirects, an orchestrator could register a callback URL that passes policy checks (public hostname, global IP), then respond with Location: http://127.0.0.1/... or another internal address. Rasa would POST task state — including structured slot data — to that internal target without running URL policy on the redirect destination. That is a classic server-side request forgery (SSRF) vector when the A2A endpoint is reachable by untrusted callers.

Register the final callback URL directly. If the callback endpoint moves, update the registered pushNotificationConfig.url on the orchestrator side.

URL validation

Callback URLs are validated on registration and before each POST:

Only http/https URLs with a hostname are accepted.
Loopback, link-local, and private-network targets are rejected (including hostnames that resolve to those addresses via DNS).
URLs with embedded credentials are rejected.
HTTP redirects are not followed on outbound push POSTs (see note above).
Outbound push POSTs use an explicit httpx timeout (5s connect, 30s overall).
DNS checks are bounded by a 5s application timeout.

Optionally restrict callbacks further with push_notification_allowed_hosts (exact hostname or subdomain match). Rejected URLs surface as InvalidParamsError on the JSON-RPC methods that register them.

DNS rebinding

DNS validation at registration and before each POST does not pin the resolved IP for the outbound connection; httpx performs a separate lookup at connect time (DNS rebinding TOCTOU). Keep the feature disabled unless required, use push_notification_allowed_hosts, prefer HTTPS callbacks, and treat callback registration as a trusted orchestrator action.

TLS

A2A shares the same port as REST, channels, and other Sanic routes. Configure HTTPS with the usual rasa run flags — not under a2a_server in endpoints.yml. A legacy tls block in endpoints.yml is rejected at startup.

Rasa serves HTTPS directly

Clients connect to Rasa over TLS using your certificate and key (the same cert covers REST, channels, and A2A):

SANIC_WORKERS=1 rasa run \
  --ssl-certificate /certs/server.pem \
  --ssl-keyfile /certs/server-key.pem \
  -m models/your-model.tar.gz \
  --endpoints endpoints.yml

When url is omitted, the AgentCard URL scheme follows --ssl-certificate. Set an explicit url: "https://..." when the public hostname or port differs from the local bind address.

HTTPS handled by a reverse proxy or ingress

Set url to the public base URL orchestrators use. Rasa can listen on plain HTTP behind the proxy while the AgentCard advertises HTTPS:

a2a_server:
  description: "My banking sub-agent"
  url: "https://public.example.com"

Monitoring and Troubleshooting

Health checks

Check	Endpoint	Purpose
Model loaded	`GET /status`	Returns `model_file` and `model_id`. Requires the Rasa auth token if configured. Use to confirm the model is loaded before sending A2A messages.
Capability discovery	`GET /.well-known/agent-card.json`	Returns the public AgentCard. Requires bearer JWT when `auth` is configured.

Key log events

Rasa emits structured log events prefixed with a2a_server.:

Event	Meaning
`a2a_server.rasa_a2a_agent_executor.execute.agent_not_ready`	Model not loaded; A2A turn returns `failed`.
`a2a_server.rasa_a2a_agent_executor.execute.context_limit`	`max_contexts` cap exceeded.
`a2a_server.rasa_a2a_agent_executor.task_timeout`	Per-task wall-clock timeout fired; task auto-canceled.
`a2a_server.sanic_app.message_in_flight_conflict`	Duplicate `messageId` while a turn is in flight (HTTP 409).
`a2a_server.sanic_app.auth_rejected`	Missing or invalid bearer token.
`a2a_server.message_adapter.skip_slot.invalid_value`	Orchestrator slot value failed type coercion.
`a2a_server.sanic_app.stream_sse.error`	SSE streaming error during `message/stream` or `tasks/resubscribe`.

Common failure modes

Symptom	Likely cause	Resolution
Startup `ValidationError` on session config	`start_session_after_expiry: true` in domain	Set `start_session_after_expiry: false`.
Startup `ValidationError` on workers	`SANIC_WORKERS > 1`	Set `SANIC_WORKERS=1`; scale with additional replicas and sticky routing by `contextId`.
Follow-up turn starts a new flow or loses slot state (multi-replica)	Load balancer round-robins A2A traffic across pods	Route by `contextId` (body, `X-A2A-Context-Id` header, or `a2a-context-id` cookie); see Multi-replica load balancing.
`tasks/cancel` fails or no-ops (multi-replica)	Cancel request routed to a pod that did not handle the in-flight turn	Send `X-A2A-Context-Id` or preserve the ingress `a2a-context-id` cookie.
Task `failed` with "Agent is not ready to handle messages."	Model not loaded or unloaded	Wait for model load; check `GET /status`.
HTTP 409 on `message/send` retry	Same `messageId` submitted while prior turn is in flight	Retry after the first task reaches a terminal state.
HTTP 401 on JSON-RPC or AgentCard	`auth` enabled; missing or invalid JWT	Send `Authorization: Bearer <jwt>` on all A2A requests.
Sanic bind failure	Port already in use	Change `--port` or free the occupied port.
Orchestrator client protocol errors	`a2a-sdk` version mismatch	Align your orchestrator client with the `a2a-sdk` version bundled with your Rasa release.

Complete Example

action_endpoint:
  url: "http://localhost:5055/webhook"

a2a_server:
  url: "https://rasa.example.com"
  description: "Banking assistant sub-agent for account transfers and appointments"
  include_conversation_repair: true
  task_timeout_seconds: 600
  max_contexts: 1000
  push_notifications_enabled: false
  auth:
    type: bearer
    jwt:
      algorithm: RS256
      public_key_path: "/run/secrets/jwt_public_key.pem"
      issuer: "https://auth.example.com"
      audience: "rasa-a2a-agent"

Out of Scope

Not tested for V1

Running Rasa as an A2A sub-agent while also invoking external sub-agents (sub_agents/ with protocol: a2a) is not tested for V1. Use Rasa in one role at a time: either orchestrator or sub-agent.

Multi-worker A2A support (persistent task and message stores) is planned for a future release. Until then, use SANIC_WORKERS=1 per replica.

Basic Configuration​

Configuration Reference​

Prerequisites​

Session configuration​

Sanic workers​

Multi-replica load balancing​

Deriving the routing key​

Istio on Kubernetes​

Other ingress and load balancers​

Orchestrator and HTTP client guidance​

HTTP Surface​

Context and task mapping​

AgentCard Generation​

Auto-generated (default)​

Static override​

Refresh timing​

Task States and Structured Output​

Orchestrator operations​

Orchestrator Slot Pre-seeding​

Authentication​

JWT configuration​

Push Notifications​

Enabling push notifications​

URL validation​

TLS​

Rasa serves HTTPS directly​

HTTPS handled by a reverse proxy or ingress​

Monitoring and Troubleshooting​

Health checks​

Key log events​

Common failure modes​

Complete Example​

Out of Scope​