A2A Server
Rasa can expose your assistant as a native Agent-to-Agent (A2A) sub-agent.
When an a2a_server block is present in endpoints.yml, rasa run registers A2A JSON-RPC routes on the same Sanic port as REST and channel webhooks.
External orchestrators discover your assistant's capabilities via AgentCard, send turns over JSON-RPC, and receive A2A task lifecycle updates mapped from Rasa's dialogue state.
This page documents the sub-agent (server) side. For Rasa as an orchestrator that calls external A2A agents, see External Sub Agent and Integrating External Agents via A2A.
Basic Configuration
Add a2a_server to your endpoints.yml file. Only description is required; other fields have sensible defaults.
a2a_server:
url: "http://localhost:5005" # optional; inferred from --port / --interface / --ssl-certificate
description: "My banking sub-agent"
include_conversation_repair: true # false → terminal completed instead of repair input_required
task_timeout_seconds: 600 # per-task safety net; 0 disables
a2a_message_cache_ttl_seconds: 600 # messageId replay TTL; defaults to task timeout
max_contexts: 1000 # in-memory session cap; 0 disables
# agent_card_path: ./static-card.json # optional static AgentCard; url still resolved at startup
# auth: … # optional JWT bearer auth
Configuration Reference
| Key | Type | Default | Required | Description |
|---|---|---|---|---|
description | string | — | yes | Public description for the AgentCard. Used for auto-generated and placeholder cards. Ignored when agent_card_path points to a static AgentCard JSON file. |
url | string | inferred from --port, --interface, and --ssl-certificate | no | Public base URL of the Rasa server as seen by A2A orchestrators, advertised in the AgentCard (scheme, host, port, path prefix). Set explicitly when the public hostname or port differs from the local bind address (for example behind a reverse proxy or ingress). |
agent_card_path | string | none (auto-generate from flows) | no | Path to a static AgentCard JSON file. When set, the file is loaded at startup and its url is overwritten with the resolved public URL. |
include_conversation_repair | boolean | true | no | When true, pattern flows (for example pattern_completed) are advertised as skills and map to input_required so orchestrators know Rasa handles conversation repair. When false, the same stack state maps to completed instead. |
task_timeout_seconds | integer | 600 | no | Maximum seconds each in-flight task_id may run before the server auto-cancels that task (safety net; use explicit tasks/cancel as the primary path). Timers are per task and are not extended by later message/send on the same contextId. Set to 0 to disable. |
a2a_message_cache_ttl_seconds | integer | same as task_timeout_seconds | no | TTL in seconds for messageId deduplication cache entries keyed by (contextId, messageId). Set to 0 for immediate expiry (effectively disables replay). |
max_contexts | integer | 1000 | no | Memory guard: maximum distinct contextId sessions retained until a terminal task outcome. input_required keeps the session reserved. Set to 0 to disable the cap. |
push_notifications_enabled | boolean | false | no | When true, the AgentCard advertises push notifications and the server may POST task updates to orchestrator-supplied callback URLs. Disabled by default because callback URLs are client-controlled and can be an SSRF vector. |
push_notification_allowed_hosts | list of strings | none | no | Optional hostname allowlist for push callback URLs. When set, only http/https URLs whose host matches an entry (exact or subdomain) are accepted. Loopback and private-network targets are always rejected. |
auth | object | none (open endpoint) | no | Bearer JWT authentication for orchestrators calling the A2A endpoint. See Authentication. |
Prerequisites
Rasa enforces two startup guardrails when a2a_server is configured. Both fail fast with actionable error messages.
Session configuration
When A2A is enabled, domain.session_config.start_session_after_expiry must be false.
Resumed orchestrator contextId values reuse the same Rasa sender_id. If start_session_after_expiry is true, Rasa runs action_session_start after inactivity and can silently reset slot state when the orchestrator resumes the same context.
This is validated when a model is loaded with a2a_server in endpoints.yml (for example during rasa run via load_agent). The error code is validation.a2a_server.incompatible_session_config.
session_config:
session_expiration_time: 60
start_session_after_expiry: false # required when a2a_server is enabled
ConversationInactive does not release an input_required A2A context. With start_session_after_expiry: false, the next message on the same contextId continues the flow after inactivity.
See Session Timer for full session configuration details.
Sanic workers
SANIC_WORKERS=1 is required until a persistent solution for storing A2A tasks and messages ships.
Multiple Sanic workers break messageId idempotency, in-flight HTTP 409 handling, orchestrator cancel, and max_contexts enforcement because these are per-worker only.
This is validated when starting the Sanic server (rasa run). The error code is validation.a2a_server.incompatible_sanic_workers.
Scale horizontally with additional replicas and load balancer sticky routing by contextId instead of increasing Sanic workers per pod.
SANIC_WORKERS=1 rasa run -m models/your-model.tar.gz --endpoints endpoints.yml
Multi-replica load balancing
When a2a_server is enabled and you run more than one Rasa replica, configure your ingress or load balancer so all A2A JSON-RPC traffic for a given contextId routes to the same pod.
A2A V1 keeps task state, messageId deduplication caches, in-flight turn queues, push notification callback registration, and max_contexts enforcement in memory on each replica. The shared tracker store (for example PostgreSQL) persists dialogue history, but A2A-specific state is not yet replicated across pods. Without sticky routing, follow-up turns on the same contextId may land on a different pod — the session can appear fresh, idempotency breaks, tasks/cancel may miss in-flight work, and push callbacks registered on one pod are invisible to others.
Requirements for multi-replica A2A:
| Requirement | Why |
|---|---|
SANIC_WORKERS=1 on every replica | A2A state is per-worker as well as per-pod. |
Sticky routing keyed on contextId | Keeps in-memory A2A state coherent for each orchestrator context. |
Orchestrator reuses contextId across turns | Already required for multi-step flows; stickiness depends on a stable key. |
When routing is configured correctly, orchestrator traffic for a given contextId (or connection) consistently hits the same pod — in-memory stores, queues, deduplication, and push config all stay coherent.
Deriving the routing key
Your load balancer must resolve a sticky key from contextId on every A2A POST / request. Use at least one of these sources (in order of typical precedence):
| Source | Used for |
|---|---|
contextId in the JSON-RPC body | message/send, message/stream, and other methods that include contextId in the payload |
X-A2A-Context-Id request header | tasks/cancel, tasks/get, and other methods where contextId is not in the body |
a2a-context-id cookie | HTTP clients that received the cookie from a prior response (for example browser-based orchestrators) |
Configure consistent hashing or equivalent session affinity on the resolved key — not round-robin alone.
Istio on Kubernetes
Apply two Istio resources when Rasa runs behind an Istio ingress with multiple replicas. Istio consistentHash accepts one hash key per DestinationRule (header, cookie, source IP, or query parameter) — use a gateway EnvoyFilter to normalize contextId into a single internal header.
1. DestinationRule — consistent-hash on that header for the Rasa Service:
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: rasa-a2a-sticky
namespace: YOUR_NAMESPACE
spec:
host: rasa.YOUR_NAMESPACE.svc.cluster.local
trafficPolicy:
loadBalancer:
consistentHash:
httpHeaderName: x-a2a-context-id
2. EnvoyFilter — on the ingress gateway (namespace: istio-system, selector istio: ingressgateway). Insert a Lua HTTP filter before envoy.filters.http.router that:
- Applies only to
POSTrequests on your assistant hostname (for exampleassistant.example.com). - Resolves
contextIdin order:"contextId"in the JSON body →X-A2A-Context-Idrequest header →a2a-context-idcookie. - Sets request header
x-a2a-context-idto the resolved value (the DestinationRule hash key). - Stores the value in Envoy dynamic metadata during the request phase.
- On response, emits
Set-Cookie: a2a-context-id=<contextId>; Path=/; Max-Age=3600; HttpOnly; SameSite=Laxso methods liketasks/cancelthat omit bodycontextIdstill route to the pod that handled prior turns.
Apply both resources with kubectl apply. Adjust spec.host and the hostname guard in the Lua filter to match your release name, namespace, and ingress host.
Verify: With at least two replicas and SANIC_WORKERS=1, send two message/send turns with the same contextId — the second should continue the dialogue, not restart it. Confirm in pod logs that each contextId appears on one pod only. Exercise tasks/cancel with X-A2A-Context-Id or cookie persistence (curl -c / -b).
Other ingress and load balancers
On NGINX Ingress, AWS ALB, HAProxy, or other gateways, implement equivalent behavior:
- Inspect
POST /JSON bodies for"contextId"where present. - Accept
X-A2A-Context-Idas a fallback for methods that only passtask_id. - Optionally set and honor an
a2a-context-idcookie for clients that do not sendcontextIdon every request. - Hash-route (or pin sessions) on that key to a single backend pod.
Orchestrator and HTTP client guidance
- Reuse the same
contextIdacross turns in a multi-step flow. - For
tasks/canceland similar calls that only passtask_idin the body, sendX-A2A-Context-Id: <contextId>or rely on cookie persistence through your ingress. - HTTP test clients should preserve cookies (
curl -c/-b) when exercising sticky ingress.
HTTP Surface
A2A routes are registered at the root of the Rasa server (no URL prefix).
| Route | Purpose |
|---|---|
POST / | A2A JSON-RPC (message/send, message/stream, tasks/cancel, tasks/get, …) |
GET /.well-known/agent-card.json | Public AgentCard (flow-generated at startup, or static file from agent_card_path) |
GET /.well-known/agent.json | Deprecated alias of the AgentCard |
Context and task mapping
contextIdin A2A requests maps to Rasasender_id. One persistent Rasa conversation exists per orchestrator context.- Each orchestrator message gets a new
task_id. input_requiredkeeps the context reserved for follow-up turns on the samecontextId.
AgentCard Generation
Orchestrators discover your assistant's capabilities through the public AgentCard.
Auto-generated (default)
When agent_card_path is omitted, Rasa auto-generates the AgentCard from user-facing flows at server startup (after the model loads):
- Each user flow becomes an
AgentSkillwithid,name, anddescription(from the flow description or readable name). - Pattern/system flows (
pattern_*) are included only wheninclude_conversation_repair: true(the default). nameis taken fromassistant_idinconfig.yml, or defaults to"Rasa Agent".versionis derived from the loaded model filename (without.tar.gzsuffix), or frommodel_id, or defaults to"1.0.0".capabilities.streamingis alwaystrue.capabilities.push_notificationsreflectspush_notifications_enabled.
Static override
Set agent_card_path to load a static AgentCard JSON file. Rasa still overwrites the url field at startup with the resolved public URL from url or from --port / --interface / --ssl-certificate. The description field in endpoints.yml is ignored when a static card is used.
Refresh timing
At rasa run startup:
- Before the model loads, a placeholder AgentCard is served (empty
skillsarray). - After the model loads, the flow-generated or static AgentCard replaces it.
Hot reload via PUT /model does not refresh the AgentCard or re-wire the A2A executor. Restart the server after deploying a new model to update advertised skills and version.
Task States and Structured Output
Rasa maps dialogue stack and tracker state to A2A task states: working, input_required, completed, failed, canceled, rejected, and auth_required.
Terminal and interactive-terminal updates include:
- A
DataPartwith structured state:state,active_flow, currentslots, and (when applicable) persisted slot values from flows that declarepersisted_slots. - A
TextPartcarrying the user-visible bot utterance when present, so clients that only readstatus.messagestill receive NLG.
message/stream emits working status and artifact deltas during token/chunk streaming custom actions. Blocking message/send returns the final task.
Orchestrator operations
tasks/cancel— idle cancel on aninput_requiredcontext, or signal in-flight work; publishescanceledand recordsFlowCancelledwhen a user flow was active.messageIddeduplication — retries with the same(contextId, messageId)replay the cached terminal task without re-running the turn. A concurrent duplicate while in flight returns HTTP 409 with a JSON-RPC error (retry after the first completes).
Orchestrator Slot Pre-seeding
On every message/send / message/stream turn, Rasa reads slot context from the incoming A2A Message, applies it as SetSlotCommands before dialogue processing, and strips slot markers from the user text so they are not passed to the LLM.
Orchestrators can supply slots in any of these shapes. Later sources override earlier ones on key conflicts within the payload:
- Text transcript (lowest precedence) — append to the text part:
SLOTS: {"slot_name": "value", ...}on its own line, or- a trailing fenced block:
```json\n{"slot_name": "value"}\n```
message.metadata—{"slots": {"slot_name": "value", ...}}(merged with JSON-RPC request metadata).DataPart(highest precedence) —{"kind": "data", "data": {"slots": {"slot_name": "value", ...}}}.
Only slots defined in the assistant domain are written; unknown, builtin, and agent-internal slots are ignored.
Values are coerced to the slot type; invalid or out-of-range values (for example categorical mismatches) are skipped.
String values "none", "null", and "undefined" are stored as null.
Per-turn precedence: if command generation extracts a SetSlotCommand for a slot from the current user message, that value wins over orchestrator metadata for the same slot on that turn. Orchestrator values still pre-fill slots the parser did not set.
Send an orchestrator slot snapshot on each follow-up message for slots the user did not answer in that turn.
Do not rely on user-visible text alone to carry slot values when a collect step is active — prefer metadata.slots or a DataPart, or keep slot JSON out of text the user is answering with.
Authentication
When a2a_server.auth is set, every protected request must include Authorization: Bearer <jwt>.
Missing or invalid tokens receive HTTP 401 with a WWW-Authenticate: Bearer response header before message processing starts.
This applies to the JSON-RPC endpoint (POST /) and AgentCard discovery (GET /.well-known/agent-card.json, including the deprecated alias).
If auth is omitted, behaviour is unchanged and no token is required.
Only type: bearer is supported (API key auth is not available).
Auth validates the orchestrator only; end-user identity belongs in A2A message metadata or slot pre-population, not in this JWT.
JWT configuration
| Key | Type | Default | Description |
|---|---|---|---|
auth.type | string | — | Must be bearer. |
auth.jwt.algorithm | string | RS256 | JWT signing algorithm. Supported: HS256, HS512, RS256, RS512, ES256, ES512, PS256. |
auth.jwt.public_key_path | string | none | Path to a PEM public key file. Required for asymmetric algorithms (RS*, ES*, PS*). |
auth.jwt.secret | string | none | Shared secret for symmetric algorithms (HS256, HS512). Must use environment variable interpolation, for example '${JWT_SECRET}'. Plaintext secrets are rejected at startup. |
auth.jwt.issuer | string | none | If set, reject tokens whose iss claim does not match. |
auth.jwt.audience | string | none | If set, reject tokens whose aud claim does not match. |
- Production (RS256)
- Local dev (HS256)
a2a_server:
description: "My domain agent"
auth:
type: bearer
jwt:
algorithm: RS256
public_key_path: "/run/secrets/jwt_public_key.pem"
issuer: "https://auth.example.com"
audience: "rasa-a2a-agent"
a2a_server:
description: "My domain agent"
auth:
type: bearer
jwt:
algorithm: HS256
secret: "${JWT_SECRET}"
Orchestrators and local test clients must send the same bearer token when fetching the AgentCard and when calling message/send, message/stream, or other A2A JSON-RPC methods.
Push Notifications
Push notifications are disabled by default (push_notifications_enabled: false) because callback URLs are client-controlled and can be an SSRF vector when the A2A endpoint is reachable by untrusted callers.
Enabling push notifications
Set push_notifications_enabled: true in endpoints.yml to advertise the capability in the AgentCard and allow orchestrators to register callback URLs:
a2a_server:
description: "My banking sub-agent"
push_notifications_enabled: true
# Optional: restrict callbacks to known orchestrator hosts
push_notification_allowed_hosts:
- "orchestrator.example.com"
- "hooks.partner.example.com"
When push notifications are enabled:
- An orchestrator supplies a
pushNotificationConfig.urlonmessage/send,message/stream, or viatasks/pushNotificationConfig/set. - Rasa POSTs task state updates to that callback as the task progresses.
message/streamemits intermediate states (submitted,working) as well as the terminal state.- The public AgentCard advertises the
push_notificationscapability. - Rasa also POSTs the terminal
canceledtask when a callback URL is registered, because the a2a-sdk suppliedon_cancel_taskdoes not send push notifications.
Rasa disables HTTP redirect following on every outbound push notification POST. URL policy is enforced only on the registered callback URL — redirect targets are not re-validated.
What happens today: if the callback responds with a 3xx redirect (for example 302 with a Location header), httpx does not follow it.
Rasa treats the delivery as failed (non-2xx response), logs the error, and does not retry.
The A2A task itself still completes normally — a failed push does not change the task outcome returned to the orchestrator over JSON-RPC or SSE.
Only a single POST is sent to the registered URL; no second request is made to the redirect target.
Why redirects are disabled: if Rasa followed redirects, an orchestrator could register a callback URL that passes policy checks (public hostname, global IP), then respond with Location: http://127.0.0.1/... or another internal address.
Rasa would POST task state — including structured slot data — to that internal target without running URL policy on the redirect destination.
That is a classic server-side request forgery (SSRF) vector when the A2A endpoint is reachable by untrusted callers.
Register the final callback URL directly.
If the callback endpoint moves, update the registered pushNotificationConfig.url on the orchestrator side.
URL validation
Callback URLs are validated on registration and before each POST:
- Only
http/httpsURLs with a hostname are accepted. - Loopback, link-local, and private-network targets are rejected (including hostnames that resolve to those addresses via DNS).
- URLs with embedded credentials are rejected.
- HTTP redirects are not followed on outbound push POSTs (see note above).
- Outbound push POSTs use an explicit httpx timeout (5s connect, 30s overall).
- DNS checks are bounded by a 5s application timeout.
Optionally restrict callbacks further with push_notification_allowed_hosts (exact hostname or subdomain match).
Rejected URLs surface as InvalidParamsError on the JSON-RPC methods that register them.
DNS validation at registration and before each POST does not pin the resolved IP for the outbound connection; httpx performs a separate lookup at connect time (DNS rebinding TOCTOU).
Keep the feature disabled unless required, use push_notification_allowed_hosts, prefer HTTPS callbacks, and treat callback registration as a trusted orchestrator action.
TLS
A2A shares the same port as REST, channels, and other Sanic routes. Configure HTTPS with the usual rasa run flags — not under a2a_server in endpoints.yml. A legacy tls block in endpoints.yml is rejected at startup.
Rasa serves HTTPS directly
Clients connect to Rasa over TLS using your certificate and key (the same cert covers REST, channels, and A2A):
SANIC_WORKERS=1 rasa run \
--ssl-certificate /certs/server.pem \
--ssl-keyfile /certs/server-key.pem \
-m models/your-model.tar.gz \
--endpoints endpoints.yml
When url is omitted, the AgentCard URL scheme follows --ssl-certificate. Set an explicit url: "https://..." when the public hostname or port differs from the local bind address.
HTTPS handled by a reverse proxy or ingress
Set url to the public base URL orchestrators use. Rasa can listen on plain HTTP behind the proxy while the AgentCard advertises HTTPS:
a2a_server:
description: "My banking sub-agent"
url: "https://public.example.com"
Monitoring and Troubleshooting
Health checks
| Check | Endpoint | Purpose |
|---|---|---|
| Model loaded | GET /status | Returns model_file and model_id. Requires the Rasa auth token if configured. Use to confirm the model is loaded before sending A2A messages. |
| Capability discovery | GET /.well-known/agent-card.json | Returns the public AgentCard. Requires bearer JWT when auth is configured. |
Key log events
Rasa emits structured log events prefixed with a2a_server.:
| Event | Meaning |
|---|---|
a2a_server.rasa_a2a_agent_executor.execute.agent_not_ready | Model not loaded; A2A turn returns failed. |
a2a_server.rasa_a2a_agent_executor.execute.context_limit | max_contexts cap exceeded. |
a2a_server.rasa_a2a_agent_executor.task_timeout | Per-task wall-clock timeout fired; task auto-canceled. |
a2a_server.sanic_app.message_in_flight_conflict | Duplicate messageId while a turn is in flight (HTTP 409). |
a2a_server.sanic_app.auth_rejected | Missing or invalid bearer token. |
a2a_server.message_adapter.skip_slot.invalid_value | Orchestrator slot value failed type coercion. |
a2a_server.sanic_app.stream_sse.error | SSE streaming error during message/stream or tasks/resubscribe. |
Common failure modes
| Symptom | Likely cause | Resolution |
|---|---|---|
Startup ValidationError on session config | start_session_after_expiry: true in domain | Set start_session_after_expiry: false. |
Startup ValidationError on workers | SANIC_WORKERS > 1 | Set SANIC_WORKERS=1; scale with additional replicas and sticky routing by contextId. |
| Follow-up turn starts a new flow or loses slot state (multi-replica) | Load balancer round-robins A2A traffic across pods | Route by contextId (body, X-A2A-Context-Id header, or a2a-context-id cookie); see Multi-replica load balancing. |
tasks/cancel fails or no-ops (multi-replica) | Cancel request routed to a pod that did not handle the in-flight turn | Send X-A2A-Context-Id or preserve the ingress a2a-context-id cookie. |
Task failed with "Agent is not ready to handle messages." | Model not loaded or unloaded | Wait for model load; check GET /status. |
HTTP 409 on message/send retry | Same messageId submitted while prior turn is in flight | Retry after the first task reaches a terminal state. |
| HTTP 401 on JSON-RPC or AgentCard | auth enabled; missing or invalid JWT | Send Authorization: Bearer <jwt> on all A2A requests. |
| Sanic bind failure | Port already in use | Change --port or free the occupied port. |
| Orchestrator client protocol errors | a2a-sdk version mismatch | Align your orchestrator client with the a2a-sdk version bundled with your Rasa release. |
Complete Example
action_endpoint:
url: "http://localhost:5055/webhook"
a2a_server:
url: "https://rasa.example.com"
description: "Banking assistant sub-agent for account transfers and appointments"
include_conversation_repair: true
task_timeout_seconds: 600
max_contexts: 1000
push_notifications_enabled: false
auth:
type: bearer
jwt:
algorithm: RS256
public_key_path: "/run/secrets/jwt_public_key.pem"
issuer: "https://auth.example.com"
audience: "rasa-a2a-agent"
Out of Scope
Running Rasa as an A2A sub-agent while also invoking external sub-agents (sub_agents/ with protocol: a2a) is not tested for V1.
Use Rasa in one role at a time: either orchestrator or sub-agent.
Multi-worker A2A support (persistent task and message stores) is planned for a future release. Until then, use SANIC_WORKERS=1 per replica.