Tracing
Tracing
Distributed tracing tracks requests as they flow through a distributed system (in this case: a Rasa assistant), sending data about the requests to a tracing backend which collects all trace data and enables inspecting it. Trace data helps you understand the flow of requests through both the components of a single service (Rasa itself), and across different distributed services, for example, your action server.
Supported Tracing Backends/Collectors
To trace requests in Rasa Pro, you can either use Jaeger as a backend, or use the OTEL Collector (OpenTelemetry Collector). to collect traces and then send them to the backend of your choice. See Configuring a Tracing Backend or Collector for instructions.
Rasa Channels
Trace context sent along with requests using the W3C Trace Context Specification via the REST channel is used to continue tracing in Rasa Pro.
Rasa Inspector
If you have enabled tracing in Rasa Pro and are using the Rasa Inspector
debugging tool to try your assistant, note that in addition to the expected tracing span for the Agent.handle_message
method call, the tracing backend will collect independent tracing spans for the MessageProcessor.get_tracker
method calls. This is expected behaviour because the Rasa Inspector tool uses the Rasa HTTP API endpoints to
retrieve the conversation tracker which is required by the Inspector interface.
Action Server
The trace context from Rasa Pro is sent along with requests to the custom action server using the W3C Trace Context Specification and then used to continue tracing the request through the custom action server.
Tracing is continued in the action server by instrumenting the webhook that receives custom actions. See Action server attributes for the attributes captured as part of the trace context.
See traced events for details on what attributes are made available as part of the trace context in Rasa Pro.
Questions Tracing Can Help Answer
Tracing can help troubleshoot issues in development and production, by answering questions such as:
- How does a user message request get processed across different components i.e. dialogue understanding components (NLU,
CommandGenerator
,CommandProcessorComponent
), policies, and action server? - Why has my Rasa assistant decided to execute a certain action?
- Why has my Rasa assistant been slow to respond?
- Why have my custom actions been slow to execute?
- What is my OpenAI prompt token usage?
- What is the performance of my Rasa assistant across different flows?
- What is the performance of my Rasa assistant across different LLM models?
- What is the performance of my Rasa assistant across different vector stores?
Configuring a Tracing Backend or Collector
To configure a tracing backend or collector, add a tracing
entry to your endpoints
i.e. in your endpoints.yml
file, or in the relevant section of your Helm values in a deployment.
Jaeger
To configure a Jaeger tracing backend, specify the type
as jaeger
.
tip
If you come across the error "OSError: [Errno 40] Message too long", read the instructions here to resolve it
OTEL Collector
Collectors are components that collect traces in a vendor-agnostic way and then forward them to various backends. For example, the OpenTelemetry Collector (OTEL) can collect traces from multiple different components and instrumentation libraries, and then export them to multiple different backends e.g. jaeger.
To configure an OTEL Collector, specify the type
as otlp
.
Traced Events
The Rasa service areas that are traceable cover the actions required to:
- train a model (i.e., the training of each graph component)
- handle a message
Model Training
Tracing is enabled for model training by instrumenting Rasa GraphTrainer
and GraphNode
classes.
GraphTrainer
Attributes
The following attributes can be inspected during training of GraphTrainer
:
training_type
of model configuration:"NLU"
"CORE"
"BOTH"
"END-TO-END"
language
of model configurationrecipe_name
used in theconfig.yml
fileoutput_filename
: the location where the packaged model is savedis_finetuning
: boolean argument, ifTrue
enables incremental training
GraphNode
Attributes
The following attributes are captured during the training (as well as prediction during message handling) of every graph node:
node_name
component_class
fn_name
: method of component class that gets called
Message Handling
The following Rasa classes are instrumented to enable tracing during message handling:
Agent
MessageProcessor
TrackerStore
LockStore
SingleStepLLMCommandGenerator
MultiStepLLMCommandGenerator
NLUCommandAdapter
FlowPolicy
IntentlessPolicy
EnterpriseSearchPolicy
InformationRetrieval
EndpointConfig
In addition, the following Python modules were instrumented to enable tracing during message handling:
- command processor module, i.e. utility functions leveraged by the
CommandProcessorComponent
to pre-process predicted commands - flow executor module, i.e. utility functions leveraged by
FlowPolicy
to advance flows
Namely, these operations are now traceable:
- receiving a message
- parsing the message
- predicting commands
- pre-processing commands
- predicting the next action
- running the action
- advancing flows
- searching documents in vector stores for enterprise search
- generating LLM answers by policies e.g.
IntentlessPolicy
andEnterpriseSearchPolicy
- tracing prompt token usage
- retrieving and saving the tracker
- locking the conversation
- publishing to the event broker
- making requests to the action server or nlg server
- passing the trace context to the action server
Tracing prompt token usage
New in 3.8
Tracing prompt token usage for OpenAI models is available starting with version 3.8.0
.
Tracing prompt token usage is available for the following classes if you're using OpenAI models:
SingleStepLLMCommandGenerator
classMultiStepLLMCommandGenerator
classIntentlessPolicy
classEnterpriseSearchPolicy
classContextualResponseRephraser
class
The prompt token usage is captured as part of the trace context and can be used to monitor the usage of prompt tokens in the LLM answer generation process.
This is only captured if one of instrumented classes mentioned above is configured to enable capturing the length of the prompt tokens.
For example, the SingleStepLLMCommandGenerator
can be configured to trace the length of the prompt tokens by setting the trace_prompt_tokens
attribute to true
in the config.yml
file:
It is highly recommended to enable tracing of prompt tokens only in development and not in production, because it could increase assistant response latency.
Agent
Attributes
Tracing the Agent
instance handling a message captures the following attributes:
input_channel
: the name of the channel connectorsender_id
: the conversation idmodel_id
: a unique identifier for the modelmodel_name
: the model name
MessageProcessor
Attributes
The following MessageProcessor
attributes are extracted during the tracing:
number_of_events
: number of events in trackeraction_name
: the name of the predicted and executed actionsender_id
: the conversation id of theDialogueStateTracker
objectmessage_id
: the unique message id
The latter three attributes are also injected in the trace context that gets passed to the requests made to the custom action server.
TrackerStore
& LockStore
Attributes
Observable TrackerStore
and LockStore
attributes include:
number_of_streamed_events
: number of new events to streambroker_class
: theEventBroker
on which the new events are publishedlock_store_class
: Name of lock store used to lock conversations while messages are actively processed
SingleStepLLMCommandGenerator
Attributes
New in 3.9
Tracing the described SingleStepLLMCommandGenerator
attributes is available starting with version 3.9.0
.
The following attributes are captured as part of the trace context of the SingleStepLLMCommandGenerator
:
class_name
: the name of the instrumented component classllm_model
: the name of the LLM usedllm_type
: the type of LLM usedembeddings
: the embeddings usedllm_temperature
: the temperature used for LLM answer generationrequest_timeout
: the timeout for the LLM requestllm_engine
: the engine used for LLM answer generationlen_prompt_tokens
: the token length of the prompt (optional, only supported for OpenAI models). To enable this attribute, see instructions in the Tracing prompt token usage section.
MultiStepLLMCommandGenerator
Attributes
New in 3.9
Tracing the described MultiStepLLMCommandGenerator
attributes is available starting with version 3.9.0
.
The following attributes are captured as part of the trace context of the MultiStepLLMCommandGenerator
:
class_name
: the name of the instrumented component classllm_model
: the name of the LLM usedllm_type
: the type of LLM usedembeddings
: the embeddings usedllm_temperature
: the temperature used for LLM answer generationrequest_timeout
: the timeout for the LLM requestllm_engine
: the engine used for LLM answer generationlen_prompt_tokens
: the token length of the prompt (optional, only supported for OpenAI models). To enable this attribute, see instructions in the Tracing prompt token usage section.
NLUCommandAdapter
Attributes
New in 3.8
Tracing the described NLUCommandAdapter
attributes is available starting with version 3.8.0
.
The following attributes are captured as part of the trace context of the NLUCommandAdapter
:
commands
: the predicted commandsintent
: the predicted intent of the user message that theNLUCommandAdapter
receives as input
Command Processor Module Attributes
New in 3.8
Tracing the described command processor module attributes is available starting with version 3.8.0
.
The following attributes are captured as part of the trace context of the command processor module functions:
execute_commands
function:number_of_events
: the number of events in the trackersender_id
: the conversation id of theDialogueStateTracker
object
validate_state_of_commands
function:cleaned_up_commands
: list of cleaned up commands
clean_up_commands
function:commands
: list of originally parsed commands from the LLM answercurrent_context
: the current context of the dialogue stack
remove_duplicated_set_slots
function:resulting_events
: list of events prior to removing duplicated set slot events; note that slot values are removed to prevent PII leakage
Flow Executor Module Attributes
New in 3.8
Tracing the described flow executor module attributes is available starting with version 3.8.0
.
The following attributes are captured as part of the trace context of the flow executor module functions:
advance_flow
function:available_actions
: list of available actionscurrent_context
: the current context of the dialogue stack
advance_flows_until_next_action
function:action_name
: the name of the action to be executedscore
: the score of the executed actionmetadata
: the prediction metadataevents
: list of event names if available
run_step
function:step_custom_id
: the custom id of the step if availablestep_description
: the description of the step if availablecurrent_flow_id
: the id of the current flowcurrent_context
: the current context of the dialogue stack
Policy
subclasses attributes
New in 3.8
Tracing the described Policy
subclasses' attributes is available starting with version 3.8.0
.
The following attributes are captured as part of the trace context of subclasses of the Policy
interface, e.g. FlowPolicy
, IntentlessPolicy
, EnterpriseSearchPolicy
:
priority
: the priority of the policy which made the predictionevents
: a list of event names which are applied independent of whether the policy wins against other policies or notoptional_events
: a list of optional event names if available elseNone
- these events are applied if the policy wins against other policiesis_end_to_end_prediction
: a boolean indicating if the prediction used the text of the user message instead of the intentis_no_user_prediction
: a boolean indicating if the prediction uses neither the text of the user message nor the intentdiagnostic_data
: intermediate results or other information that is not necessary for Rasa to function, but intended for debugging and fine-tuning purposesaction_metadata
: additional metadata that can be passed by policies
IntentlessPolicy
Attributes
New in 3.8
Tracing the described IntentlessPolicy
attributes is available starting with version 3.8.0
.
Depending on the instrumented policy method, the following attributes are captured as part of the trace context of the IntentlessPolicy
:
current_context
: the context of the top dialogue stack frame, received as input by theIntentlessPolicy.find_closest_response
methodai_response_examples
: the sample responses that fit the current conversation, returned by theIntentlessPolicy.select_response_examples
methodconversation_samples
: the conversation samples returned by theIntentlessPolicy.select_few_shot_conversations
methodai_responses
: the AI responses extracted from the conversation samples by theIntentlessPolicy.extract_ai_responses
methodllm_response
: the response generated by the LLM model call, returned by theIntentlessPolicy.generate_answer
methodaction_name
: the name of the action to be executed, received as input by theIntentlessPolicy._prediction_result
methodscore
: the score of the executed action, received as input by theIntentlessPolicy._prediction_result
method
In addition, the IntentlessPolicy._generate_llm_answer
captures the same attributes as the SingleStepLLMCommandGenerator
class.
EnterpriseSearchPolicy
Attributes
New in 3.8
Tracing the described EnterpriseSearchPolicy
attributes is available starting with version 3.8.0
.
The EnterpriseSearchPolicy._generate_llm_answer
method captures the same attributes as the SingleStepLLMCommandGenerator
class.
InformationRetrieval
Attributes
New in 3.8
Tracing the described InformationRetrieval
subclasses' attributes is available starting with version 3.8.0
.
The following attributes are captured as part of the trace context of the InformationRetrieval
subclasses, e.g. Milvus_Store
, Qdrant_Store
:
query
: the query used to search the vector storedocument_metadata
: the metadata of the documents retrieved from the vector store
EndpointConfig
Attributes
New in 3.8
Tracing the described EndpointConfig
attributes is available starting with version 3.8.0
.
The following attributes are captured as part of the trace context of the EndpointConfig
:
url
: the url of the endpointrequest_body_size_in_bytes
: the size of the request body in bytes
Tracing in the Action Server
API Requests are traced as they flow through the action server by instrumenting the webhook that receives custom actions and other classes involved in the execution of custom actions.
New in 3.8
Additional classes are now instrumented to improve tracing in the action server.
The following classes are instrumented;
- ValidationAction: the base class for custom actions extracting and validating slots that can be set or updated outside a form context.
- FormValidationAction: the base class for custom actions extracting and validating slots that are set only within the context of a form.
- ActionExecutor - the class that executes the custom actions.
Webhook Attributes
The following attributes are captured as part of the trace context of the webhook that receives custom actions;
http.method
: the http method used to make the requesthttp.route
: the endpoint of the requestnext_action
: the name of the next action to be executedversion
: the rasa version usedsender_id
: the id of the conversationmessage_id
: the unique message id
Action Executor Attributes
The following attributes are captured as part of the trace context of the action executor;
action_name
: the name of the action to be executedsender_id
: the id of the conversationevents
: a list of returned eventsslots
: a list of filled slots by the executed custom actionutters
: a list of executed utterances
Slot Validation Action Attributes
The following attributes are captured as part of the trace context of Slot Validation Actions;
class_name
: the name of the instrumented component classaction_name
: the name of the action to be executedsender_id
: the id of the conversationevents
: a list of returned eventsslots
: a list of filled slots by the executed custom actionutters
: a list of executed utterancesmessage_count
: the number of messagesslots_to_validate
: a list of recently filled slots to validate
Debugging custom actions performance
New in 3.8
You can now continue tracing the request further along your custom actions code.
It is now possible to debug the performance of your custom actions by tracing specific parts of your custom actions code. This can be achieved by creating spans to trace the execution of these parts.
In order to create more spans, you can retrieve the tracer object
from the ActionExecutorTracerRegister
component.
To create a span as documented in the OTEL documentation,
corresponding to traces from a specific part of your custom actions code, you can embed the following code snippet in the run
method of the custom action:
For example, a complete custom action that implements a custom span is shown below:
Enabling and disabling tracing in the action server is also done in the same way as described below. The same Tracing Backends/Collectors listed above are also supported for the action server. See Configuring a Tracing Backend or Collector for further instructions.
Enabling / Disabling
Tracing is automatically enabled in Rasa Pro by configuring a supported tracing backend. No further action is required to enable tracing.
You can disable tracing by leaving the tracing:
configuration key empty
in your endpoints file.