Skip to main content

A2A Server

New in Rasa Pro 3.17

Rasa can expose your assistant as a native Agent-to-Agent (A2A) sub-agent.

When an a2a_server block is present in endpoints.yml, rasa run registers A2A JSON-RPC routes on the same Sanic port as REST and channel webhooks. External orchestrators discover your assistant's capabilities via AgentCard, send turns over JSON-RPC, and receive A2A task lifecycle updates mapped from Rasa's dialogue state.

This page documents the sub-agent (server) side. For Rasa as an orchestrator that calls external A2A agents, see External Sub Agent and Integrating External Agents via A2A.

Basic Configuration

Add a2a_server to your endpoints.yml file. Only description is required; other fields have sensible defaults.

a2a_server:
url: "http://localhost:5005" # optional; inferred from --port / --interface / --ssl-certificate
description: "My banking sub-agent"
include_conversation_repair: true # false → terminal completed instead of repair input_required
task_timeout_seconds: 600 # per-task safety net; 0 disables
a2a_message_cache_ttl_seconds: 600 # messageId replay TTL; defaults to task timeout
max_contexts: 1000 # in-memory session cap; 0 disables
# agent_card_path: ./static-card.json # optional static AgentCard; url still resolved at startup
# auth: … # optional JWT bearer auth

Configuration Reference

KeyTypeDefaultRequiredDescription
descriptionstringyesPublic description for the AgentCard. Used for auto-generated and placeholder cards. Ignored when agent_card_path points to a static AgentCard JSON file.
urlstringinferred from --port, --interface, and --ssl-certificatenoPublic base URL of the Rasa server as seen by A2A orchestrators, advertised in the AgentCard (scheme, host, port, path prefix). Set explicitly when the public hostname or port differs from the local bind address (for example behind a reverse proxy or ingress).
agent_card_pathstringnone (auto-generate from flows)noPath to a static AgentCard JSON file. When set, the file is loaded at startup and its url is overwritten with the resolved public URL.
include_conversation_repairbooleantruenoWhen true, pattern flows (for example pattern_completed) are advertised as skills and map to input_required so orchestrators know Rasa handles conversation repair. When false, the same stack state maps to completed instead.
task_timeout_secondsinteger600noMaximum seconds each in-flight task_id may run before the server auto-cancels that task (safety net; use explicit tasks/cancel as the primary path). Timers are per task and are not extended by later message/send on the same contextId. Set to 0 to disable.
a2a_message_cache_ttl_secondsintegersame as task_timeout_secondsnoTTL in seconds for messageId deduplication cache entries keyed by (contextId, messageId). Set to 0 for immediate expiry (effectively disables replay).
max_contextsinteger1000noMemory guard: maximum distinct contextId sessions retained until a terminal task outcome. input_required keeps the session reserved. Set to 0 to disable the cap.
push_notifications_enabledbooleanfalsenoWhen true, the AgentCard advertises push notifications and the server may POST task updates to orchestrator-supplied callback URLs. Disabled by default because callback URLs are client-controlled and can be an SSRF vector.
push_notification_allowed_hostslist of stringsnonenoOptional hostname allowlist for push callback URLs. When set, only http/https URLs whose host matches an entry (exact or subdomain) are accepted. Loopback and private-network targets are always rejected.
authobjectnone (open endpoint)noBearer JWT authentication for orchestrators calling the A2A endpoint. See Authentication.

Prerequisites

Rasa enforces two startup guardrails when a2a_server is configured. Both fail fast with actionable error messages.

Session configuration

When A2A is enabled, domain.session_config.start_session_after_expiry must be false.

Resumed orchestrator contextId values reuse the same Rasa sender_id. If start_session_after_expiry is true, Rasa runs action_session_start after inactivity and can silently reset slot state when the orchestrator resumes the same context.

This is validated when a model is loaded with a2a_server in endpoints.yml (for example during rasa run via load_agent). The error code is validation.a2a_server.incompatible_session_config.

domain.yml
session_config:
session_expiration_time: 60
start_session_after_expiry: false # required when a2a_server is enabled

ConversationInactive does not release an input_required A2A context. With start_session_after_expiry: false, the next message on the same contextId continues the flow after inactivity.

See Session Timer for full session configuration details.

Sanic workers

SANIC_WORKERS=1 is required until a persistent solution for storing A2A tasks and messages ships. Multiple Sanic workers break messageId idempotency, in-flight HTTP 409 handling, orchestrator cancel, and max_contexts enforcement because these are per-worker only.

This is validated when starting the Sanic server (rasa run). The error code is validation.a2a_server.incompatible_sanic_workers.

Scale horizontally with additional replicas and load balancer sticky routing by contextId instead of increasing Sanic workers per pod.

SANIC_WORKERS=1 rasa run -m models/your-model.tar.gz --endpoints endpoints.yml

Multi-replica load balancing

When a2a_server is enabled and you run more than one Rasa replica, configure your ingress or load balancer so all A2A JSON-RPC traffic for a given contextId routes to the same pod.

A2A V1 keeps task state, messageId deduplication caches, in-flight turn queues, push notification callback registration, and max_contexts enforcement in memory on each replica. The shared tracker store (for example PostgreSQL) persists dialogue history, but A2A-specific state is not yet replicated across pods. Without sticky routing, follow-up turns on the same contextId may land on a different pod — the session can appear fresh, idempotency breaks, tasks/cancel may miss in-flight work, and push callbacks registered on one pod are invisible to others.

Requirements for multi-replica A2A:

RequirementWhy
SANIC_WORKERS=1 on every replicaA2A state is per-worker as well as per-pod.
Sticky routing keyed on contextIdKeeps in-memory A2A state coherent for each orchestrator context.
Orchestrator reuses contextId across turnsAlready required for multi-step flows; stickiness depends on a stable key.

When routing is configured correctly, orchestrator traffic for a given contextId (or connection) consistently hits the same pod — in-memory stores, queues, deduplication, and push config all stay coherent.

Deriving the routing key

Your load balancer must resolve a sticky key from contextId on every A2A POST / request. Use at least one of these sources (in order of typical precedence):

SourceUsed for
contextId in the JSON-RPC bodymessage/send, message/stream, and other methods that include contextId in the payload
X-A2A-Context-Id request headertasks/cancel, tasks/get, and other methods where contextId is not in the body
a2a-context-id cookieHTTP clients that received the cookie from a prior response (for example browser-based orchestrators)

Configure consistent hashing or equivalent session affinity on the resolved key — not round-robin alone.

Istio on Kubernetes

Apply two Istio resources when Rasa runs behind an Istio ingress with multiple replicas. Istio consistentHash accepts one hash key per DestinationRule (header, cookie, source IP, or query parameter) — use a gateway EnvoyFilter to normalize contextId into a single internal header.

1. DestinationRule — consistent-hash on that header for the Rasa Service:

apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: rasa-a2a-sticky
namespace: YOUR_NAMESPACE
spec:
host: rasa.YOUR_NAMESPACE.svc.cluster.local
trafficPolicy:
loadBalancer:
consistentHash:
httpHeaderName: x-a2a-context-id

2. EnvoyFilter — on the ingress gateway (namespace: istio-system, selector istio: ingressgateway). Insert a Lua HTTP filter before envoy.filters.http.router that:

  • Applies only to POST requests on your assistant hostname (for example assistant.example.com).
  • Resolves contextId in order: "contextId" in the JSON body → X-A2A-Context-Id request header → a2a-context-id cookie.
  • Sets request header x-a2a-context-id to the resolved value (the DestinationRule hash key).
  • Stores the value in Envoy dynamic metadata during the request phase.
  • On response, emits Set-Cookie: a2a-context-id=<contextId>; Path=/; Max-Age=3600; HttpOnly; SameSite=Lax so methods like tasks/cancel that omit body contextId still route to the pod that handled prior turns.

Apply both resources with kubectl apply. Adjust spec.host and the hostname guard in the Lua filter to match your release name, namespace, and ingress host.

Verify: With at least two replicas and SANIC_WORKERS=1, send two message/send turns with the same contextId — the second should continue the dialogue, not restart it. Confirm in pod logs that each contextId appears on one pod only. Exercise tasks/cancel with X-A2A-Context-Id or cookie persistence (curl -c / -b).

Other ingress and load balancers

On NGINX Ingress, AWS ALB, HAProxy, or other gateways, implement equivalent behavior:

  • Inspect POST / JSON bodies for "contextId" where present.
  • Accept X-A2A-Context-Id as a fallback for methods that only pass task_id.
  • Optionally set and honor an a2a-context-id cookie for clients that do not send contextId on every request.
  • Hash-route (or pin sessions) on that key to a single backend pod.

Orchestrator and HTTP client guidance

  • Reuse the same contextId across turns in a multi-step flow.
  • For tasks/cancel and similar calls that only pass task_id in the body, send X-A2A-Context-Id: <contextId> or rely on cookie persistence through your ingress.
  • HTTP test clients should preserve cookies (curl -c / -b) when exercising sticky ingress.

HTTP Surface

A2A routes are registered at the root of the Rasa server (no URL prefix).

RoutePurpose
POST /A2A JSON-RPC (message/send, message/stream, tasks/cancel, tasks/get, …)
GET /.well-known/agent-card.jsonPublic AgentCard (flow-generated at startup, or static file from agent_card_path)
GET /.well-known/agent.jsonDeprecated alias of the AgentCard

Context and task mapping

  • contextId in A2A requests maps to Rasa sender_id. One persistent Rasa conversation exists per orchestrator context.
  • Each orchestrator message gets a new task_id.
  • input_required keeps the context reserved for follow-up turns on the same contextId.

AgentCard Generation

Orchestrators discover your assistant's capabilities through the public AgentCard.

Auto-generated (default)

When agent_card_path is omitted, Rasa auto-generates the AgentCard from user-facing flows at server startup (after the model loads):

  • Each user flow becomes an AgentSkill with id, name, and description (from the flow description or readable name).
  • Pattern/system flows (pattern_*) are included only when include_conversation_repair: true (the default).
  • name is taken from assistant_id in config.yml, or defaults to "Rasa Agent".
  • version is derived from the loaded model filename (without .tar.gz suffix), or from model_id, or defaults to "1.0.0".
  • capabilities.streaming is always true.
  • capabilities.push_notifications reflects push_notifications_enabled.

Static override

Set agent_card_path to load a static AgentCard JSON file. Rasa still overwrites the url field at startup with the resolved public URL from url or from --port / --interface / --ssl-certificate. The description field in endpoints.yml is ignored when a static card is used.

Refresh timing

At rasa run startup:

  1. Before the model loads, a placeholder AgentCard is served (empty skills array).
  2. After the model loads, the flow-generated or static AgentCard replaces it.

Hot reload via PUT /model does not refresh the AgentCard or re-wire the A2A executor. Restart the server after deploying a new model to update advertised skills and version.

Task States and Structured Output

Rasa maps dialogue stack and tracker state to A2A task states: working, input_required, completed, failed, canceled, rejected, and auth_required.

Terminal and interactive-terminal updates include:

  • A DataPart with structured state: state, active_flow, current slots, and (when applicable) persisted slot values from flows that declare persisted_slots.
  • A TextPart carrying the user-visible bot utterance when present, so clients that only read status.message still receive NLG.

message/stream emits working status and artifact deltas during token/chunk streaming custom actions. Blocking message/send returns the final task.

Orchestrator operations

  • tasks/cancel — idle cancel on an input_required context, or signal in-flight work; publishes canceled and records FlowCancelled when a user flow was active.
  • messageId deduplication — retries with the same (contextId, messageId) replay the cached terminal task without re-running the turn. A concurrent duplicate while in flight returns HTTP 409 with a JSON-RPC error (retry after the first completes).

Orchestrator Slot Pre-seeding

On every message/send / message/stream turn, Rasa reads slot context from the incoming A2A Message, applies it as SetSlotCommands before dialogue processing, and strips slot markers from the user text so they are not passed to the LLM.

Orchestrators can supply slots in any of these shapes. Later sources override earlier ones on key conflicts within the payload:

  1. Text transcript (lowest precedence) — append to the text part:
    • SLOTS: {"slot_name": "value", ...} on its own line, or
    • a trailing fenced block: ```json\n{"slot_name": "value"}\n```
  2. message.metadata{"slots": {"slot_name": "value", ...}} (merged with JSON-RPC request metadata).
  3. DataPart (highest precedence) — {"kind": "data", "data": {"slots": {"slot_name": "value", ...}}}.

Only slots defined in the assistant domain are written; unknown, builtin, and agent-internal slots are ignored. Values are coerced to the slot type; invalid or out-of-range values (for example categorical mismatches) are skipped. String values "none", "null", and "undefined" are stored as null.

Per-turn precedence: if command generation extracts a SetSlotCommand for a slot from the current user message, that value wins over orchestrator metadata for the same slot on that turn. Orchestrator values still pre-fill slots the parser did not set.

Send an orchestrator slot snapshot on each follow-up message for slots the user did not answer in that turn. Do not rely on user-visible text alone to carry slot values when a collect step is active — prefer metadata.slots or a DataPart, or keep slot JSON out of text the user is answering with.

Authentication

When a2a_server.auth is set, every protected request must include Authorization: Bearer <jwt>. Missing or invalid tokens receive HTTP 401 with a WWW-Authenticate: Bearer response header before message processing starts.

This applies to the JSON-RPC endpoint (POST /) and AgentCard discovery (GET /.well-known/agent-card.json, including the deprecated alias). If auth is omitted, behaviour is unchanged and no token is required.

Only type: bearer is supported (API key auth is not available). Auth validates the orchestrator only; end-user identity belongs in A2A message metadata or slot pre-population, not in this JWT.

JWT configuration

KeyTypeDefaultDescription
auth.typestringMust be bearer.
auth.jwt.algorithmstringRS256JWT signing algorithm. Supported: HS256, HS512, RS256, RS512, ES256, ES512, PS256.
auth.jwt.public_key_pathstringnonePath to a PEM public key file. Required for asymmetric algorithms (RS*, ES*, PS*).
auth.jwt.secretstringnoneShared secret for symmetric algorithms (HS256, HS512). Must use environment variable interpolation, for example '${JWT_SECRET}'. Plaintext secrets are rejected at startup.
auth.jwt.issuerstringnoneIf set, reject tokens whose iss claim does not match.
auth.jwt.audiencestringnoneIf set, reject tokens whose aud claim does not match.
a2a_server:
description: "My domain agent"
auth:
type: bearer
jwt:
algorithm: RS256
public_key_path: "/run/secrets/jwt_public_key.pem"
issuer: "https://auth.example.com"
audience: "rasa-a2a-agent"

Orchestrators and local test clients must send the same bearer token when fetching the AgentCard and when calling message/send, message/stream, or other A2A JSON-RPC methods.

Push Notifications

Push notifications are disabled by default (push_notifications_enabled: false) because callback URLs are client-controlled and can be an SSRF vector when the A2A endpoint is reachable by untrusted callers.

Enabling push notifications

Set push_notifications_enabled: true in endpoints.yml to advertise the capability in the AgentCard and allow orchestrators to register callback URLs:

a2a_server:
description: "My banking sub-agent"
push_notifications_enabled: true
# Optional: restrict callbacks to known orchestrator hosts
push_notification_allowed_hosts:
- "orchestrator.example.com"
- "hooks.partner.example.com"

When push notifications are enabled:

  • An orchestrator supplies a pushNotificationConfig.url on message/send, message/stream, or via tasks/pushNotificationConfig/set.
  • Rasa POSTs task state updates to that callback as the task progresses.
  • message/stream emits intermediate states (submitted, working) as well as the terminal state.
  • The public AgentCard advertises the push_notifications capability.
  • Rasa also POSTs the terminal canceled task when a callback URL is registered, because the a2a-sdk supplied on_cancel_task does not send push notifications.
Redirects are disabled

Rasa disables HTTP redirect following on every outbound push notification POST. URL policy is enforced only on the registered callback URL — redirect targets are not re-validated.

What happens today: if the callback responds with a 3xx redirect (for example 302 with a Location header), httpx does not follow it. Rasa treats the delivery as failed (non-2xx response), logs the error, and does not retry. The A2A task itself still completes normally — a failed push does not change the task outcome returned to the orchestrator over JSON-RPC or SSE. Only a single POST is sent to the registered URL; no second request is made to the redirect target.

Why redirects are disabled: if Rasa followed redirects, an orchestrator could register a callback URL that passes policy checks (public hostname, global IP), then respond with Location: http://127.0.0.1/... or another internal address. Rasa would POST task state — including structured slot data — to that internal target without running URL policy on the redirect destination. That is a classic server-side request forgery (SSRF) vector when the A2A endpoint is reachable by untrusted callers.

Register the final callback URL directly. If the callback endpoint moves, update the registered pushNotificationConfig.url on the orchestrator side.

URL validation

Callback URLs are validated on registration and before each POST:

  • Only http/https URLs with a hostname are accepted.
  • Loopback, link-local, and private-network targets are rejected (including hostnames that resolve to those addresses via DNS).
  • URLs with embedded credentials are rejected.
  • HTTP redirects are not followed on outbound push POSTs (see note above).
  • Outbound push POSTs use an explicit httpx timeout (5s connect, 30s overall).
  • DNS checks are bounded by a 5s application timeout.

Optionally restrict callbacks further with push_notification_allowed_hosts (exact hostname or subdomain match). Rejected URLs surface as InvalidParamsError on the JSON-RPC methods that register them.

DNS rebinding

DNS validation at registration and before each POST does not pin the resolved IP for the outbound connection; httpx performs a separate lookup at connect time (DNS rebinding TOCTOU). Keep the feature disabled unless required, use push_notification_allowed_hosts, prefer HTTPS callbacks, and treat callback registration as a trusted orchestrator action.

TLS

A2A shares the same port as REST, channels, and other Sanic routes. Configure HTTPS with the usual rasa run flags — not under a2a_server in endpoints.yml. A legacy tls block in endpoints.yml is rejected at startup.

Rasa serves HTTPS directly

Clients connect to Rasa over TLS using your certificate and key (the same cert covers REST, channels, and A2A):

SANIC_WORKERS=1 rasa run \
--ssl-certificate /certs/server.pem \
--ssl-keyfile /certs/server-key.pem \
-m models/your-model.tar.gz \
--endpoints endpoints.yml

When url is omitted, the AgentCard URL scheme follows --ssl-certificate. Set an explicit url: "https://..." when the public hostname or port differs from the local bind address.

HTTPS handled by a reverse proxy or ingress

Set url to the public base URL orchestrators use. Rasa can listen on plain HTTP behind the proxy while the AgentCard advertises HTTPS:

a2a_server:
description: "My banking sub-agent"
url: "https://public.example.com"

Monitoring and Troubleshooting

Health checks

CheckEndpointPurpose
Model loadedGET /statusReturns model_file and model_id. Requires the Rasa auth token if configured. Use to confirm the model is loaded before sending A2A messages.
Capability discoveryGET /.well-known/agent-card.jsonReturns the public AgentCard. Requires bearer JWT when auth is configured.

Key log events

Rasa emits structured log events prefixed with a2a_server.:

EventMeaning
a2a_server.rasa_a2a_agent_executor.execute.agent_not_readyModel not loaded; A2A turn returns failed.
a2a_server.rasa_a2a_agent_executor.execute.context_limitmax_contexts cap exceeded.
a2a_server.rasa_a2a_agent_executor.task_timeoutPer-task wall-clock timeout fired; task auto-canceled.
a2a_server.sanic_app.message_in_flight_conflictDuplicate messageId while a turn is in flight (HTTP 409).
a2a_server.sanic_app.auth_rejectedMissing or invalid bearer token.
a2a_server.message_adapter.skip_slot.invalid_valueOrchestrator slot value failed type coercion.
a2a_server.sanic_app.stream_sse.errorSSE streaming error during message/stream or tasks/resubscribe.

Common failure modes

SymptomLikely causeResolution
Startup ValidationError on session configstart_session_after_expiry: true in domainSet start_session_after_expiry: false.
Startup ValidationError on workersSANIC_WORKERS > 1Set SANIC_WORKERS=1; scale with additional replicas and sticky routing by contextId.
Follow-up turn starts a new flow or loses slot state (multi-replica)Load balancer round-robins A2A traffic across podsRoute by contextId (body, X-A2A-Context-Id header, or a2a-context-id cookie); see Multi-replica load balancing.
tasks/cancel fails or no-ops (multi-replica)Cancel request routed to a pod that did not handle the in-flight turnSend X-A2A-Context-Id or preserve the ingress a2a-context-id cookie.
Task failed with "Agent is not ready to handle messages."Model not loaded or unloadedWait for model load; check GET /status.
HTTP 409 on message/send retrySame messageId submitted while prior turn is in flightRetry after the first task reaches a terminal state.
HTTP 401 on JSON-RPC or AgentCardauth enabled; missing or invalid JWTSend Authorization: Bearer <jwt> on all A2A requests.
Sanic bind failurePort already in useChange --port or free the occupied port.
Orchestrator client protocol errorsa2a-sdk version mismatchAlign your orchestrator client with the a2a-sdk version bundled with your Rasa release.

Complete Example

action_endpoint:
url: "http://localhost:5055/webhook"

a2a_server:
url: "https://rasa.example.com"
description: "Banking assistant sub-agent for account transfers and appointments"
include_conversation_repair: true
task_timeout_seconds: 600
max_contexts: 1000
push_notifications_enabled: false
auth:
type: bearer
jwt:
algorithm: RS256
public_key_path: "/run/secrets/jwt_public_key.pem"
issuer: "https://auth.example.com"
audience: "rasa-a2a-agent"

Out of Scope

Not tested for V1

Running Rasa as an A2A sub-agent while also invoking external sub-agents (sub_agents/ with protocol: a2a) is not tested for V1. Use Rasa in one role at a time: either orchestrator or sub-agent.

Multi-worker A2A support (persistent task and message stores) is planned for a future release. Until then, use SANIC_WORKERS=1 per replica.