Observability Metrics
Metrics are runtime measurements that capture indicators of a service’s availability and performance. Unlike tracing—which helps you understand the sequence of operations for a single request—metrics provide an aggregated statistical view across multiple requests or conversations. Typical examples include average response time, throughput, and CPU/memory consumption. Monitoring these helps you:
- Track the health of your service.
- Quickly detect and alert on outages or anomalies.
- Quantify the impact of code or infrastructure changes on performance.
When combined with tracing, metrics give you a complete view of your deployment behavior, making it easier to debug issues and optimize resource usage.
How To Use Metrics
Enabling Metrics in Rasa Pro
Rasa Pro uses an OpenTelemetry (OTEL) Collector to collect metrics and send them to your desired backend (e.g., Prometheus, Datadog, etc.).
-
Configure OTEL in your endpoints file (or Helm values):
endpoints.ymlmetrics:
type: otlp
endpoint: my-otlp-host:4318
insecure: false
service_name: rasa
root_certificates: ./tests/unit/tracing/fixtures/ca.pemtype: otlp
indicates you are using OpenTelemetry’s OTLP format.endpoint
is the URL of the OTEL Collector or metrics backend.service_name
is an identifier for your Rasa Pro service.insecure
/root_certificates
specify how TLS is handled.
-
Use Tracing for a Complete View (Recommended):
Metrics become even more powerful when paired with Tracing because tracing surfaces the sequence of internal method calls, while metrics aggregate their performance.
Custom Metrics Collected by Rasa Pro
Once configured, Rasa Pro automatically collects several custom metrics relevant to large language model (LLM) usage and overall assistant performance:
- CPU and Memory Usage of any LLM-based command generator (e.g.,
SingleStepLLMCommandGenerator
,MultiStepLLMCommandGenerator
) at the time of making an LLM call. - Prompt Token Usage for LLM-based command generators, provided the
trace_prompt_tokens
config property is enabled. - Method Call Durations for LLM-specific components, such as:
IntentlessPolicy
EnterpriseSearchPolicy
ContextualResponseRephraser
SingleStepLLMCommandGenerator
MultiStepLLMCommandGenerator
- HTTP Request Metrics for the Rasa client:
- Duration of requests to external services (action server, NLG server, etc.).
- Request size in bytes.
By collecting these telemetry metrics, you gain robust insights into how your assistant performs under real-world usage. You can proactively detect issues, understand resource consumption, and tailor your assistant’s architecture to provide the best possible experience for your users.