Skip to main content

Assertions reference

Why testing your assistant?

For more information E2E testing and assertions, see the E2E testing product documentation.

Beta feature

Assertions are a beta feature in Rasa Pro 3.10.0. The feature is subject to change and may not be fully stable. To enable the feature, please set the environment variable RASA_PRO_BETA_E2E_ASSERTIONS to true in your testing environment.

export RASA_PRO_BETA_E2E_ASSERTIONS=true

Installation and Configuration Prerequisites

To evaluate generative assistant responses for relevance and factual accuracy in your end-to-end tests, please install the optional dependency mlflow to enable these capabilities. This dependency uses LLM (Large Language Model) evaluation to assess the relevance and factual accuracy of the Rasa Pro assistant's generative responses. This LLM is also referred to as a "LLM-as-Judge" model because it assesses another model's output. In Rasa Pro's use case, the LLM-as-Judge model evaluates whether the generative response is relevant to the provided input or whether the generative response is factually accurate in relation to the provided or extracted ground truth text input.

You can install the dependency using the following commands:

pip install rasa-pro[mlflow]
# or if you are using poetry
poetry add "rasa-pro[mlflow]"
poetry add rasa-pro -E mlflow

Generative Response LLM Judge Configuration

info

This functionality only supports OpenAI models for the LLM Judge model at the moment.

By default, the LLM Judge model is configured to use the OpenAI gpt-4o-mini model to benefit of the long context window. If you want to use a different model, you can configure the LLM Judge model in the conftest.yml file which is a new testing configuration file added in Rasa Pro 3.10. It is automatically discoverable by Rasa Pro as long as it is placed in the root directory of your assistant project.

conftest.yml
llm_as_judge:
api_type: openai
model: "gpt-4-0613"

Assertion Types

Assertions allow you to check events like flows starting, or to confirm if a generative response is relevant/grounded, among others.

Important

If a user step contains assertions, the older step types like bot: ... or utter: ... are ignored within that same step. You'll have to rely on the bot_uttered assertion to check the response.

Below is a comprehensive list of assertion types you can use in your E2E tests. These allow you to verify everything from flow status to the factual grounding of a generative response.

Flow Started Assertion

flow_started checks if the flow with the provided id was started.

tests/e2e_test_cases.yml
test_cases:
- test_case: flight_booking
steps:
- user: "I want to book a flight"
assertions:
- flow_started: "flight_booking"

Flow Completed Assertion

flow_completed checks if the flow with the provided id was completed. Optionally, you can specify a flow_step_id if you want to confirm the final flow step.

tests/e2e_test_cases.yml
test_cases:
- test_case: flight_booking
steps:
- user: "What is the average cost of a flight from New York to San Francisco?"
assertions:
- flow_completed:
flow_id: "pattern_search"
flow_step_id: "action_trigger_search"

Flow Cancelled Assertion

flow_cancelled checks if the flow with the provided id was cancelled. You can also specify a flow_step_id if needed.

tests/e2e_test_cases.yml
test_cases:
- test_case: flight_booking
steps:
... # other user steps
- user: "Wait, I changed my mind, I don't want to book a flight."
assertions:
- flow_cancelled:
flow_id: "flight_booking"
flow_step_id: "make_payment"

Pattern Clarification Contains Assertion

pattern_clarification_contains checks if the clarification (repair) pattern was triggered and returned the expected flow names. This assertion must list all flow names that you expect the pattern to suggest.

tests/e2e_test_cases.yml
test_cases:
- test_case: flight_booking
steps:
- user: "make booking"
assertions:
- pattern_clarification_contains:
- "flight booking"
- "hotel booking"

Slot Was Set Assertion

slot_was_set checks if the slot(s) with the provided name were filled with the provided value. Match the slot’s type in your domain (e.g. use boolean, integer, or float as appropriate without quotes).

tests/e2e_test_cases.yml
test_cases:
- test_case: flight_booking
steps:
- user: "I want to book a flight from New York to San Francisco"
assertions:
- slot_was_set:
- name: "origin"
value: "New York"
- name: "destination"
value: "San Francisco"

Slot Was Not Set Assertion

slot_was_not_set checks if a slot was not filled. If you specify a value, it checks the slot was not filled with that value.

tests/e2e_test_cases.yml
test_cases:
- test_case: flight_booking
steps:
- user: "I want to book a flight to San Francisco."
assertions:
- slot_was_not_set:
- name: "origin"
- slot_was_not_set:
- name: "destination"
value: "New York"

If only name is provided, the test confirms the slot’s value remains None (or uninitialized).

Action Executed Assertion

action_executed checks if the specified action was triggered.

tests/e2e_test_cases.yml
test_cases:
- test_case: flight_booking
steps:
- user: "Book me a flight from New York to San Francisco tomorrow first thing in the morning."
assertions:
- action_executed: "action_book_flight"

Bot Uttered Assertion

bot_uttered checks if the bot’s last utterance matches the provided pattern, buttons, and/or domain response name. Use text_matches for the utterance text, which can be a string or regex.

tests/e2e_test_cases.yml
test_cases:
- test_case: flight_booking
steps:
- user: "I want to book a flight"
assertions:
- bot_uttered:
utter_name: utter_ask_destination
text_matches: "Where would you like to fly to?"
buttons:
- title: "New York"
payload: "/SetSlots(destination=New York)"
- title: "San Francisco"
payload: "/SetSlots(destination=San Francisco)"

When asserting buttons, list them in the same order as defined in your domain file or custom action.

Bot Did Not Utter Assertion

bot_did_not_utter checks that the bot’s utterance does not match the provided pattern, buttons, or domain response name.

tests/e2e_test_cases.yml
test_cases:
- test_case: flight_booking
steps:
- user: "I want to book a flight"
assertions:
- bot_did_not_utter:
utter_name: utter_ask_payment
text_matches: "How would you like to pay?"
buttons:
- title: "Credit Card"
payload: "/set_payment_method{'method': 'credit_card'}"
- title: "PayPal"
payload: "/set_payment_method{'method': 'paypal'}"

Generative Response Is Relevant Assertion

generative_response_is_relevant checks if the bot’s generative response is relevant to the user’s message. A threshold (0–1) indicates how strictly you compare the system’s relevance score. The LLM Judge model scores a response 1 to 5, which maps to a 0–1 range.

tests/e2e_test_cases.yml
test_cases:
- test_case: flight_booking
steps:
- user: "What times are the flights from New York to San Francisco tomorrow?"
assertions:
- generative_response_is_relevant:
threshold: 0.90

You can also specify utter_name if you want to check a specific domain response event:

tests/e2e_test_cases.yml
- user: "Actually, I want to amend flight date to next week."
assertions:
- generative_response_is_relevant:
threshold: 0.90
utter_name: utter_ask_correction_confirmation

Generative Response Is Grounded Assertion

generative_response_is_grounded checks if the bot’s generative response is factually accurate given a ground-truth reference. Like the relevance check, it uses a 0–1 threshold.

tests/e2e_test_cases.yml
test_cases:
- test_case: flight_booking
steps:
- user: "What is the average cost of a flight from New York to San Francisco?"
assertions:
- generative_response_is_grounded:
threshold: 0.90
ground_truth: "The average cost of a flight from New York to San Francisco is $500."

If the correct factual source is available in the response metadata (e.g. from an Enterprise Search lookup or rephrased domain response), the test runner can extract it automatically if you don’t provide ground_truth directly.