Skip to main content

Tracing

Tracing is a key element of observability, providing visibility into how requests flow through your assistant’s distributed components (e.g. Rasa Pro/CALM runtime, custom actions server, external services).

Observability is the practice of instrumenting systems so you can monitor, troubleshoot, and optimize them in production. In a conversational assistant, observability typically consists of:

  • Metrics: Quantitative measurements (e.g. average response time)
  • Logs: Event logs emitted by various components
  • Tracing: Detailed timelines of requests and their paths across distributed services

Tracing stands out by focusing on the life cycle of each individual request. When a user sends a message, the request might pass through your assistant’s LLM-based command generator, custom action server, flow logic, and external APIs. Tracing captures these hops along with relevant metadata—giving you a precise picture of where bottlenecks or unexpected behaviors may be occurring.

Why Do You Need Tracing in Your Assistant

When building pro-code conversational AI solutions, teams need to quickly answer questions like:

  • What exactly happened when the assistant processed a user message?

    See which flows were invoked, which custom actions ran, which commands got generated, and why.

  • Why is my assistant slow or occasionally unresponsive?

    Is the slowness coming from an LLM call, a vector store query, or a custom action HTTP request?

  • How can I debug or optimize custom action performance?

    Pinpoint exactly where your code is spending time—e.g. upstream or downstream dependencies in your custom actions.

  • How can I track LLM usage and costs?

    Trace the LLM’s prompt token usage, temperature settings, etc. to monitor usage in real time.

By having trace data in one place, you can correlate user behavior, system logs, and LLM calls to rapidly diagnose and fix issues. In production, distributed tracing is often the fastest path to root-cause analysis and performance tuning.

How to Enable Tracing in Rasa

Enabling tracing in Rasa Pro is straightforward—simply connect it to a supported tracing backend or collector. Once configured, Rasa Pro emits trace spans for conversation processing, LLM calls, flow transitions, custom actions, and more.

1. Choose Your Tracing Backend or Collector

Rasa Pro supports:

  • Jaeger A popular open source end-to-end distributed tracing system.
  • OTEL Collector (OpenTelemetry Collector) A vendor-agnostic approach that can forward data to Jaeger, Zipkin, or other tracing backends.

2. Configure tracing in Your Endpoints

In your endpoints.yml or Helm values, add a tracing: block. For example, to configure Jaeger:

endpoints.yml
tracing:
type: jaeger
host: localhost
port: 6831
service_name: rasa
sync_export: ~

Or to configure an OTEL collector:

endpoints.yml
tracing:
type: otlp
endpoint: my-otlp-host:4318
insecure: false
service_name: rasa
root_certificates: ./path/to/ca.pem

Once you’ve done this, tracing is automatically enabled for both the Rasa Pro runtime and the custom action server. No additional code is needed to start collecting standard spans.

3. (Optional) Instrument Custom Code

If you want deeper insights into custom action performance, you can add custom spans to specific parts of your code. For example, you can retrieve the tracer in your action server and wrap code sections in manual spans. This is especially useful for investigating complex logic or third-party dependencies.

info

For a full list of traced events and code snippets for custom instrumentation, see our reference documentation (link to the “Tracing” reference page).

Best Practices for Tracing

  1. Start Tracing Early

    Instrument your assistant from the beginning of development—waiting until there’s a performance issue might make it harder to diagnose.

  2. Trace Only Where Needed

    While you can trace everything, capturing very verbose details (like prompt token usage) in production can add overhead. Often, it’s better to enable advanced tracing features in test or staging environments, then turn them off once you identify the root cause.

  3. Use a Single Collector

    Sending data to a single OTEL or Jaeger collector is simpler to maintain and ensures all trace spans appear in one place—important for diagnosing end-to-end issues.

  4. Correlate with Logs & Metrics

    Traces alone might not be sufficient. Combine them with logs (e.g. error messages) and metrics (e.g. average token usage per conversation) to get a 360° view of system health.

  5. Leverage the Dialogue Stack

    CALM’s dialogue stack concept means multiple flows can be active at once. Tracing helps you see which flow is top-of-stack at any given time—and why that flow was triggered.

  6. Monitor Token Usage

    If using LLM-based features, especially with enterprise search or multi-step LLM prompts, keep an eye on token usage to manage costs and latency.

  7. Focus on Action Server Performance

    Custom actions are often the biggest source of latencies. Use tracing spans around external API calls in your action code to detect slow dependencies.

Once configured, tracing gives you a powerful lens into how your assistant orchestrates the user conversation, dialogues, LLM calls, and custom actions. It is an essential tool for ensuring your assistant runs reliably in production, and for enabling fast, data-driven debugging when issues arise.