Skip to main content

Exposing Rasa as an A2A Sub-Agent

New in Rasa Pro 3.17

Rasa can expose your assistant as a native Agent-to-Agent (A2A) sub-agent.

Overview

In the orchestrator model, Rasa calls external A2A agents from flows. This guide covers the inverse role: your Rasa assistant runs as an A2A sub-agent that an external orchestrator discovers and invokes.

With an a2a_server block in endpoints.yml, rasa run exposes the A2A protocol on the same port as REST and channel webhooks. The orchestrator fetches your AgentCard, sends user turns over JSON-RPC, and receives structured task lifecycle updates mapped from Rasa's dialogue state.

For full configuration reference, see A2A Server.

Prerequisites

Before enabling A2A server mode:

  1. Train a CALM assistant with the user-facing flows you want to expose as skills.
  2. Set session config in domain.ymlstart_session_after_expiry must be false so resumed orchestrator contexts do not trigger action_session_start and reset slots.
domain.yml
session_config:
session_expiration_time: 60
start_session_after_expiry: false # required when a2a_server is enabled
  1. Plan for single-worker deployment — set SANIC_WORKERS=1 until Redis-backed A2A stores ship. Scale with additional replicas and sticky load balancing by contextId instead of multiple Sanic workers per pod.

Step 1 — Add a2a_server to endpoints.yml

Only description is required. Add a public url when orchestrators reach Rasa through a load balancer or ingress rather than localhost.

endpoints.yml
a2a_server:
url: "http://localhost:5005"
description: "Banking assistant for transfers and appointments"
include_conversation_repair: true

If you use custom actions, include your action_endpoint in the same file as usual.

Step 2 — Start the server

Run Rasa with a single Sanic worker and your trained model:

SANIC_WORKERS=1 rasa run -m models/your-model.tar.gz --endpoints endpoints.yml

To serve HTTPS directly, pass the same rasa run SSL flags used for REST and channels — not an a2a_server.tls block in endpoints.yml:

SANIC_WORKERS=1 rasa run \
--ssl-certificate /certs/server.pem \
--ssl-keyfile /certs/server-key.pem \
-m models/your-model.tar.gz \
--endpoints endpoints.yml

When Rasa sits behind a reverse proxy or ingress, set a2a_server.url to the public https://... base URL orchestrators use. See TLS in the A2A server reference.

Multi-replica deployments

If you run more than one Rasa pod, configure your ingress or load balancer to route all A2A traffic for a given contextId to the same replica. Without this, follow-up turns, messageId deduplication, tasks/cancel, and push callbacks can break because A2A state is in-memory per pod.

Configure consistent hashing (or session affinity) on contextId from the JSON body, the X-A2A-Context-Id header, or an a2a-context-id cookie. For Istio, apply a gateway EnvoyFilter plus a DestinationRule — see Istio on Kubernetes in the A2A server reference.

Full platform guidance: Multi-replica load balancing.

A2A routes are registered at the root of the server:

RoutePurpose
POST /A2A JSON-RPC
GET /.well-known/agent-card.jsonPublic AgentCard

Step 3 — Verify the server is ready

Confirm the model is loaded and the AgentCard is available:

# Model loaded (no auth configured)
curl http://localhost:5005/status

# Model loaded (Rasa auth token configured)
curl -H "Authorization: Bearer <rasa-auth-token>" http://localhost:5005/status

# Capability discovery (no A2A bearer auth configured)
curl http://localhost:5005/.well-known/agent-card.json

# Capability discovery (A2A bearer auth configured via a2a_server.auth)
curl -H "Authorization: Bearer <jwt>" http://localhost:5005/.well-known/agent-card.json

The AgentCard lists your user flows as skills. After the model loads, the card is generated from flow definitions (or loaded from agent_card_path if you configured a static card).

Write descriptions with discoverability in mind

The AgentCard is the primary signal an orchestrator uses when deciding which sub-agent to route a task to. Two descriptions feed into it:

  • a2a_server.description in endpoints.yml — describes the agent as a whole. Be specific about domain, user population, and what the agent handles (for example: "Banking assistant for retail customers: account transfers, balance enquiries, and appointment booking."). A vague description like "A helpful bot" gives the orchestrator nothing to match against.
  • Flow description fields in your YAML flows — each flow becomes a skill in the AgentCard. Write them as concise capability statements from the user's perspective (for example: "Transfer money between the user's accounts" rather than "transfer_money flow"). Orchestrators use skill descriptions to select and invoke the right capability.

Poor descriptions lead to missed routing or incorrect skill selection. Treat them as part of the public API contract.

Step 4 — Point your orchestrator at the endpoint

Configure your orchestrator to use the url from the AgentCard. The orchestrator should:

  1. Fetch GET /.well-known/agent-card.json to discover skills and capabilities.
  2. Send user turns via message/send (blocking) or message/stream (SSE streaming).
  3. Reuse the same contextId across turns in a multi-step flow.
  4. Supply a new messageId per turn (retries with the same messageId replay the cached result).

Example message/send request

curl -X POST http://localhost:5005/ \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": "req-1",
"method": "message/send",
"params": {
"message": {
"role": "user",
"parts": [{"kind": "text", "text": "I want to book an appointment"}],
"messageId": "msg-1",
"contextId": "ctx-user-42"
}
}
}'

When a2a_server.auth is configured, add Authorization: Bearer <jwt> to every A2A request:

curl -X POST http://localhost:5005/ \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <jwt>" \
-d '{
"jsonrpc": "2.0",
"id": "req-1",
"method": "message/send",
"params": {
"message": {
"role": "user",
"parts": [{"kind": "text", "text": "I want to book an appointment"}],
"messageId": "msg-1",
"contextId": "ctx-user-42"
}
}
}'

Step 5 — Read structured results from completed flows

Each orchestrator message creates a new A2A task_id. Rasa maps dialogue state to A2A task states: working, input_required, completed, failed, canceled, rejected, and auth_required.

Terminal and interactive-terminal responses include:

  • A TextPart with the user-visible bot utterance (for clients that only read status.message).
  • A DataPart with structured state your orchestrator can parse programmatically.

input_required — flow needs more user input

The orchestrator should show the TextPart to the end user and send a follow-up message on the same contextId:

{
"state": "input_required",
"active_flow": "transfer_money",
"slots": {
"account_number": null,
"amount": null
}
}

completed — flow finished

Persisted slot values from flows that declare persisted_slots are returned:

{
"state": "completed",
"slots": {
"appointment_id": "APT-12345"
}
}

Use message/stream when your orchestrator needs working status updates and artifact deltas during streaming custom actions. Blocking message/send returns only the final task.

Optional — Pass slots from the orchestrator

On each turn, the orchestrator can pre-seed domain slots before Rasa processes the user message. This is useful when the orchestrator already knows context (user ID, account type, etc.) and should not rely on the LLM parsing slot values from free text.

Supply slots via message.metadata:

{
"message": {
"role": "user",
"parts": [{"kind": "text", "text": "Transfer money to savings"}],
"messageId": "msg-2",
"contextId": "ctx-user-42",
"metadata": {
"slots": {
"account_type": "checking"
}
}
}
}

Other supported shapes and precedence rules are documented in Orchestrator slot pre-seeding.

Optional — Secure the endpoint with JWT

For production, configure bearer JWT auth so only trusted orchestrators can call your sub-agent:

endpoints.yml
a2a_server:
description: "Banking assistant"
auth:
type: bearer
jwt:
algorithm: RS256
public_key_path: "/run/secrets/jwt_public_key.pem"
issuer: "https://auth.example.com"
audience: "rasa-a2a-agent"

See Authentication for full options.

Optional — Enable push notifications

Push notifications are disabled by default. Enable them only when your orchestrator needs HTTP callbacks for task state updates:

endpoints.yml
a2a_server:
description: "Banking assistant"
push_notifications_enabled: true
push_notification_allowed_hosts:
- "orchestrator.example.com"

The orchestrator registers a pushNotificationConfig.url on message/send, message/stream, or via tasks/pushNotificationConfig/set. See Push notifications for SSRF safeguards and redirect behaviour.

Next steps

TopicDocumentation
All a2a_server keys, defaults, and guardrailsA2A Server reference
Architecture, context mapping, and task lifecycleRasa as A2A Agent Architecture
Rasa as orchestrator calling external agentsIntegrating External Agents via A2A
Monitoring, log events, and troubleshootingMonitoring and troubleshooting

Out of scope

Not tested for version 3.17

Running Rasa as an A2A sub-agent while also invoking external sub-agents (sub_agents/ with protocol: a2a) is not tested for 3.17. Use Rasa in one role at a time: either orchestrator or sub-agent. When in the sub-agent mode, we strongly recommend you to expose all sub-agents to the A2A client/orchestrator rather than chaining them through the Rasa subagent.