Version: Latest



New in 3.8

You can now measure the performance of your CALM assistant with OpenTelemetry-based metrics.

Metrics are measurements of a service captured at runtime that serve as indicators of availability and performance. Metrics can be used to monitor the health of a service, alert of an outage, and to understand the impact of changes to a service. Unlike tracing, metrics are intended to provide aggregated statistical information, such as average response time or throughput, across multiple messages and conversations.

Configuring Metrics

To enable metrics gathering in Rasa Pro, you must use the OTEL Collector (OpenTelemetry Collector) to collect metrics and then send them to the backend of your choice.

To configure the metrics OTEL collector, add a metrics entry to your endpoints i.e. in your endpoints.yml file, or in the relevant section of your Helm values in a deployment.

To configure an OTEL Collector, specify the type as otlp.

type: otlp
endpoint: my-otlp-host:4318
insecure: false
service_name: rasa
root_certificates: ./tests/unit/tracing/fixtures/ca.pem

Note that metrics must be used together with tracing to provide a complete view of your system.

Recorded Metrics

Supported custom metrics include:

  • CPU and memory usage of the LLMCommandGenerator component at the time of making a LLM call
  • LLMCommandGenerator prompt token usage (provided the trace_prompt_tokens config property is enabled)
  • method call duration measurements for LLM specific calls in components such as IntentlessPolicy, EnterpriseSearchPolicy, ContextualResponseRephraser, LLMCommandGenerator
  • rasa client http request duration (e.g. to the action server or nlg server)
  • rasa client http request size in bytes