Warning: This document is for an old version of Rasa. The latest version is 1.10.1.

Validate Data

Test Domain and Data Files for Mistakes

To verify if there are any mistakes in your domain file, NLU data, or story data, run the validate script. You can run it with the following command:

rasa data validate

The script above runs all the validations on your files, except for story structure validation, which is omitted unless you provide the --max-history argument. Here is the list of options to the script:

usage: rasa data validate [-h] [-v] [-vv] [--quiet]
                          [--max-history MAX_HISTORY] [--fail-on-warnings]
                          [-d DOMAIN] [--data DATA]
                          {stories} ...

positional arguments:
  {stories}
    stories             Checks for inconsistencies in the story files.

optional arguments:
  -h, --help            show this help message and exit
  --max-history MAX_HISTORY
                        Number of turns taken into account for story structure
                        validation. (default: None)
  --fail-on-warnings    Fail validation on warnings and errors. If omitted
                        only errors will result in a non zero exit code.
                        (default: False)
  -d DOMAIN, --domain DOMAIN
                        Domain specification (yml file). (default: domain.yml)
  --data DATA           Path to the file or directory containing Rasa data.
                        (default: data)

Python Logging Options:
  -v, --verbose         Be verbose. Sets logging level to INFO. (default:
                        None)
  -vv, --debug          Print lots of debugging statements. Sets logging level
                        to DEBUG. (default: None)
  --quiet               Be quiet! Sets logging level to WARNING. (default:
                        None)

By default the validator searches only for errors in the data (e.g. the same example being listed as an example for two intents), but does not report other minor issues (such as unused intents, utterances that are not listed as actions). To also report the later use the -debug flag.

You can also run these validations through the Python API by importing the Validator class, which has the following methods:

from_files(): Creates the instance from string paths to the necessary files.

verify_intents(): Checks if intents listed in domain file are consistent with the NLU data.

verify_example_repetition_in_intents(): Checks if there is no duplicated data among distinct intents at NLU data.

verify_intents_in_stories(): Verification for intents in the stories, to check if they are valid.

verify_utterances(): Checks domain file for consistency between responses listed in the responses section and the utterance actions you have defined.

verify_utterances_in_stories(): Verification for utterances in stories, to check if they are valid.

verify_all(): Runs all verifications above.

verify_domain_validity(): Check if domain is valid.

To use these functions it is necessary to create a Validator object and initialize the logger. See the following code:

import logging
from rasa import utils
from rasa.core.validator import Validator

logger = logging.getLogger(__name__)

utils.configure_colored_logging('DEBUG')

validator = Validator.from_files(domain_file='domain.yml',
                                 nlu_data='data/nlu_data.md',
                                 stories='data/stories.md')

validator.verify_all()

Test Story Files for Conflicts

In addition to the default tests described above, you can also do a more in-depth structural test of your stories. In particular, you can test if your stories are inconsistent, i.e. if different bot actions follow from the same dialogue history. If this is not the case, then Rasa cannot learn the correct behaviour.

Take, for example, the following two stories:

## Story 1
* greet
  - utter_greet
* inform_happy
  - utter_happy
  - utter_goodbye

## Story 2
* greet
  - utter_greet
* inform_happy
  - utter_goodbye

These two stories are inconsistent, because Rasa doesn’t know if it should predict utter_happy or utter_goodbye after inform_happy, as there is nothing that would distinguish the dialogue states at inform_happy in the two stories and the subsequent actions are different in Story 1 and Story 2.

This conflict can be automatically identified with our story structure validation tool. To do this, use rasa data validate in the command line, as follows:

rasa data validate stories --max-history 3
> 2019-12-09 09:32:13 INFO     rasa.core.validator  - Story structure validation...
> 2019-12-09 09:32:13 INFO     rasa.core.validator  - Assuming max_history = 3
>   Processed Story Blocks: 100% 2/2 [00:00<00:00, 3237.59it/s, # trackers=1]
> 2019-12-09 09:32:13 WARNING  rasa.core.validator  - CONFLICT after intent 'inform_happy':
>   utter_goodbye predicted in 'Story 2'
>   utter_happy predicted in 'Story 1'

Here we specify a max-history value of 3. This means, that 3 events (user messages / bot actions) are taken into account for action predictions, but the particular setting does not matter for this example, because regardless of how long of a history you take into account, the conflict always exists.

Warning

The rasa data validate stories script assumes that all your story names are unique. If your stories are in the Markdown format, you may find duplicate names with a command like grep -h "##" data/*.md | uniq -c | grep "^[^1]".