Skip to main content

Observability Metrics

Metrics are runtime measurements that capture indicators of a service’s availability and performance. Unlike tracing—which helps you understand the sequence of operations for a single request—metrics provide an aggregated statistical view across multiple requests or conversations. Typical examples include average response time, throughput, and CPU/memory consumption. Monitoring these helps you:

  • Track the health of your service.
  • Quickly detect and alert on outages or anomalies.
  • Quantify the impact of code or infrastructure changes on performance.

When combined with tracing, metrics give you a complete view of your deployment behavior, making it easier to debug issues and optimize resource usage.

How To Use Metrics

Enabling Metrics in Rasa Pro

Rasa Pro uses an OpenTelemetry (OTEL) Collector to collect metrics and send them to your desired backend (e.g., Prometheus, Datadog, etc.).

  1. Configure OTEL in your endpoints file (or Helm values):

    endpoints.yml
    metrics:
    type: otlp
    endpoint: my-otlp-host:4318
    insecure: false
    service_name: rasa
    root_certificates: ./tests/unit/tracing/fixtures/ca.pem
    • type: otlp indicates you are using OpenTelemetry’s OTLP format.
    • endpoint is the URL of the OTEL Collector or metrics backend.
    • service_name is an identifier for your Rasa Pro service.
    • insecure/root_certificates specify how TLS is handled.
  2. Use Tracing for a Complete View (Recommended):

    Metrics become even more powerful when paired with Tracing because tracing surfaces the sequence of internal method calls, while metrics aggregate their performance.

Custom Metrics Collected by Rasa Pro

Once configured, Rasa Pro automatically collects several custom metrics relevant to large language model (LLM) usage and overall assistant performance:

  • CPU and Memory Usage of any LLM-based command generator (e.g., SingleStepLLMCommandGenerator, MultiStepLLMCommandGenerator) at the time of making an LLM call.
  • Prompt Token Usage for LLM-based command generators, provided the trace_prompt_tokens config property is enabled.
  • Method Call Durations for LLM-specific components, such as:
    • IntentlessPolicy
    • EnterpriseSearchPolicy
    • ContextualResponseRephraser
    • SingleStepLLMCommandGenerator
    • MultiStepLLMCommandGenerator
  • HTTP Request Metrics for the Rasa client:
    • Duration of requests to external services (action server, NLG server, etc.).
    • Request size in bytes.

By collecting these telemetry metrics, you gain robust insights into how your assistant performs under real-world usage. You can proactively detect issues, understand resource consumption, and tailor your assistant’s architecture to provide the best possible experience for your users.