Observability Metrics
Metrics are runtime measurements that capture indicators of a service’s availability and performance. Unlike tracing—which helps you understand the sequence of operations for a single request—metrics provide an aggregated statistical view across multiple requests or conversations. Typical examples include average response time, throughput, and CPU/memory consumption. Monitoring these helps you:
- Track the health of your service.
- Quickly detect and alert on outages or anomalies.
- Quantify the impact of code or infrastructure changes on performance.
When combined with tracing, metrics give you a complete view of your deployment behavior, making it easier to debug issues and optimize resource usage.
How To Use Metrics
Enabling Metrics in Rasa
Rasa uses an OpenTelemetry (OTEL) Collector to collect metrics and send them to your desired backend (e.g., Prometheus, Datadog, etc.).
-
Configure OTEL in your endpoints file (or Helm values):
endpoints.ymlmetrics:
type: otlp
endpoint: my-otlp-host:4318
insecure: false
service_name: rasa
root_certificates: ./tests/unit/tracing/fixtures/ca.pemtype: otlpindicates you are using OpenTelemetry’s OTLP format.endpointis the URL of the OTEL Collector or metrics backend.service_nameis an identifier for your Rasa Pro service.insecure/root_certificatesspecify how TLS is handled.
-
Use Tracing for a Complete View (Recommended):
Metrics become even more powerful when paired with Tracing because tracing surfaces the sequence of internal method calls, while metrics aggregate their performance.
Custom Metrics Collected by Rasa
Once configured, Rasa automatically collects several custom metrics relevant to large language model (LLM) usage and overall assistant performance:
- CPU and Memory Usage of any LLM-based command generator (e.g.,
CompactLLMCommandGenerator,SearchReadyLLMCommandGenerator) at the time of making an LLM call. - Prompt Token Usage for LLM-based command generators, provided the
trace_prompt_tokensconfig property is enabled. - Method Call Durations for LLM-specific components, such as:
EnterpriseSearchPolicyContextualResponseRephraserCompactLLMCommandGeneratorSearchReadyLLMCommandGenerator
- HTTP Request Metrics for the Rasa client:
- Duration of requests to external services (action server, NLG server, etc.).
- Request size in bytes.
Sub-agents (ReAct and A2A)
When a flow uses autonomous steps to hand off to a ReAct or A2A sub-agent, Rasa emits extra OpenTelemetry histograms (in addition to the spans described in Tracing):
-
agent_execution_duration— Wall time for each sub-agent run (_call_agent_with_retryin the agent executor). Useful for end-to-end latency and error rates per sub-agent. Attribute labels include:agent_name— Sub-agent id from configuration.protocol_type— How the sub-agent is connected:mcp_openormcp_taskfor ReAct-style MCP sub-agents, ora2afor A2A sub-agents.status— Final status of the sub-agent result.
-
mcp_tool_execution_duration— Time spent inside an MCP tool call. The same metric name is used in two execution paths; use theexecution_contextlabel to distinguish them:execution_context=flow— Tool invoked from a flow MCP tool step. Labels includetool_id,mcp_server, andsuccess.execution_context=agent— Tool invoked while a ReAct MCP sub-agent runs (_execute_tool_call). Labels includetool_name,agent_name,protocol_type, andsuccess.
-
ReAct MCP sub-agent LLM usage — For MCP-based sub-agents, each LLM
send_messageround-trip also records resource-style histograms (CPU and memory sampled like other LLM components, plus estimated prompt size and response duration):mcp_agent_llm_cpu_usagemcp_agent_llm_memory_usagemcp_agent_llm_prompt_token_usagemcp_agent_llm_response_duration
By collecting these telemetry metrics, you gain robust insights into how your assistant performs under real-world usage. You can proactively detect issues, understand resource consumption, and tailor your assistant’s architecture to provide the best possible experience for your users.