Testing Your Assistant
Rasa lets you validate and test your Assistant end-to-end.
Validating Data
Data validation verifies that no mistakes or major inconsistencies appear in your domain, NLU data, or story data. To validate your data, have your CI run this command:
If you pass a max_history
value to one or more policies in your config.yml
file, provide the
smallest of those values as
If data validation results in errors, training a model can also fail or yield bad performance, so it's
always good to run this check before training a model. By including the
--fail-on-warnings
flag, this step will fail on warnings indicating more minor issues.
To read more about the validator and all of the available options, see the documentation for
rasa data validate
.
End-To-End Testing
New in 3.5
You can now use end-to-end testing to test your assistant as a whole, including dialogue management and custom actions.
End-to-end (E2E) testing is an enhanced and comprehensive CLI-based testing tool that allows you to test conversation scenarios with different pre-configured contexts, execute custom actions, verify response texts or names, and assert when slots are filled or not.
End-to-end testing is not limited to testing only the NLU or the dialogue model and allows you to design effective acceptance or integration tests. The main features of end-to-end testing are:
- integration with the action server: you can execute custom actions in your tests; the prerequisite is to start the action server in the background.
- test parametrization (e.g. different user profiles or other external factors): you can define multiple test fixtures with different pre-filled slots and re-use them in your tests.
- verifying response texts or names: you can assert that the bot response text (including interpolated responses
with slot values and conditional response variations) or
utter
name is as expected. - asserting that the bot either sets the slot value as expected or does not set the slot, or that the value is different from a specified value in the test step.
How to write test cases
To write test cases, you need to create a YAML file inside the tests
directory of your project. The name of the file
should be e2e_test_cases.yml
. You can also create a subdirectory inside the tests
directory and place your test case
YAML files there. These files will be automatically discovered and run by Rasa Pro, however you need to provide
the path to the subdirectory as positional argument to the rasa test e2e
command.
Each input file must contain the test_cases
key. The value of this key is a list of test cases.
Each test case must include a name given to the test_case
key and a list of test steps given to the steps
key.
A step can be either one of the following:
user
: refers to a user's message.bot
: denotes a textual response generated by the bot.utter
: refers to a bot response name as defined in the domain file.slot_was_set
: indicates the successful setting of a slot, and depending on how it's defined:- if a slot name is provided, e.g.
my_slot
: checks that a slot with the given name is set. - if a key-value pair is provided, e.g.
my_slot: value
: checks that a slot with the given name is set with the expected value.
- if a slot name is provided, e.g.
slot_was_not_set
: specifies the failure to set a slot, and depending on the defenition:- if a slot name is provided, e.g.
my_slot
: checks that a slot with the given name is not set. - if a key-value pair is provided, e.g.
my_slot: value
: checks that a slot with the given name is not set or that its value differs from the expected value.
- if a slot name is provided, e.g.
The following example illustrates how user
, bot
, utter
and slot_was_set
steps can be used in a test case:
In each of the steps after the user
step, you can specify multiple expected events. Any additional events that are found
are ignored and do not cause the test to fail.
Slots
Slots can be specified as a list of either string values (representing the slot name) or of slot name and slot value pairs. If the slot is specified as a key-value pair, the
the values are also compared. If the slot step contains only the slot name, it is asserted that the slot was either set (when using slot_was_set
step) or not (when using slot_was_not_set
step).
It is not required to specify all the slots events in the slot_was_set
or slot_was_not_set
steps, you can indicate only a subset of slots that you want to check.
You can think of slot_was_not_set
as an inverse of slot_was_set
and when specifing a slot_was_not_set
step
it looks for the absence of the SlotSet
event in the tracker store but only for that particular user
step not globally
(more at order of test steps).
Fixtures for Pre-Filled Slots
Using fixtures is an optional feature that enables the pre-filling of slots, ensuring specific context before individual test cases are run.
The fixtures
key at the top level of your test case configuration consists of a list of fixture names, each of which must be unique.
These fixture names correspond to sets of slot key-value pairs. When a particular test case needs predefined slot values,
you can reference the fixture name within the test case definition by adding it to the fixtures
key.
Consider the following example, which includes a test case file with fixtures and two test cases that leverage these fixtures:
These slots in fixtures are set after the action_session_start
action and before the first step is executed by the test runner.
How to run the tests
To run the end-to-end tests locally or in the CI pipeline, use the rasa test e2e
command.
The command takes the following arguments:
- positional argument for the path to the test cases file or directory containing the test cases:
rasa test e2e <path>
If unspecified, the default path istests/e2e_test_cases.yml
. - optional argument for the trained model:
--model <path>
- optional argument for retrieving the trained model from remote storage:
--remote-storage <remote-storage-location>
- optional argument for the
endpoints.yml
file:--endpoints <path>
- optional argument for stopping the test run at first failure:
rasa test e2e --fail-fast
- optional argument for exporting the test results to
e2e_results.yml
file:rasa test e2e -o
Testing custom actions
To test custom actions, there are three prerequisites:
- The action server must be running in the background.
- The
endpoints.yml
file must contain the action server configuration. - The action side effects must result in a
SlotSet
,BotUttered
orUserUttered
event.
When initiating the E2E test, a check is performed to determine if the bot
configuration specifies an action server endpoint in the endpoints.yml
file. If an endpoint is defined,
a health-check is performed by calling the /health
endpoint.
In case the action server is not responding, the test run exits with a status code of 1
and an error message.
You can start the action server in the background before running the tests:
How to interpret the output
By default, the results are always printed to stdout
and the command will exit with exit code 0
(if all tests passed)
or 1
(in case of test failures).
The output structure is inspired by pytest
:
- Failed test cases will be stacked in the order of their completion, each failed test highlighting the difference
in identified mismatches similar to
git diff
:- In the case of test steps that are passed there is no colouring or prefix. (
user
steps are used as anchors for the diff and are always passed.) - Expected test steps that have failed will be preceded by
+
prefix (green in colour), while actual messages will be preceded by-
prefix (red in colour).
- In the case of test steps that are passed there is no colouring or prefix. (
- The short test summary includes a list of every failed test case name and file location in a new line.
If -o
flag is specified in the command, the results are also written to the tests/e2e_results.yml
file, which will
contain a list of test results with the following keys:
name
: the name of the test casepass_status
: the status of the test case, eitherTrue
orFalse
expected_steps
: the expected test stepsdifference
: a list of differences between the expected and actual test steps
Limitations
End-to-end testing is a powerful tool for conducting thorough assessments of your conversational AI system. However, it does come with certain limitations that are important to consider. These limitations are as follows:
Dependency on events and the tracker store
End-to-end testing heavily relies on the availability of specific event types in the tracker store.
In particular, it requires the presence of events such as BotUttered
, UserUttered
, and SlotSet
to execute tests effectively.
If your test scenario involves actions or events that do not generate these specific events,
the testing algorithm is not able to evaluate them.
Order of test steps and events
It is essential to structure your test cases to closely mimic real conversations to avoid potential issues.
The test runner works by running the user
steps and capturing the events generated by the bot from the tracker store,
after which it will compare the events generated by the bot with the expected events be it bot
or slot
test steps.
It's best to avoid creating test cases with mupltiple user
steps followed by bot events, as it will only evaluate events created from the last user
step.
Note that the order of the bot
, utter
and slot
steps which follow a user
step is not important.
The test case will pass as long as the bot
, utter
and slot
events are executed after the user
step.
Testing the start of a conversation
The evaluation of actual events against the defined expected test steps begins after the action_session_start
action and it's advisable to start the test with a user step.
However it is possible to test before the first user utterance when the action_session_start
has been customized.
Testing the accuracy of Enterprise Search
End-to-end testing does not support testing the accuracy of the Enterprise Search Policy at this stage. It is due to the fact that the test runner is matching the bot responses with the expected responses in an exact match fashion. This means that even slight differences in the bot response will result in a test failure.