config.yml file takes a
which you can use to customize the policies your assistant uses.
There are different policies to choose from, and you can include
multiple policies in a single configuration. Here's an example of
what a list of policies might look like:
Starting from scratch?
If you don't know which policies to choose, leave out the
policies key from your
If you do, the Suggested Config
feature will provide default policies for you.
At every turn, each policy defined in your configuration will predict a next action with a certain confidence level. For more information about how each policy makes its decision, read into the policy's description below. The assistant's next action is then decided by the policy that predicts with the highest confidence.
By default, your assistant can predict a maximum of 10 next actions
after each user message. To update this value,
you can set the environment variable
to the desired number of maximum predictions.
In the case that two policies predict with equal confidence (for example, the Memoization and Rule Policies might both predict with confidence 1), the priority of the policies is considered. Rasa policies have default priorities that are set to ensure the expected outcome in the case of a tie. They look like this, where higher numbers have higher priority:
In general, it is not recommended to have more than one policy per priority level. If you have 2 policies with the same priority and they predict with the same confidence, the resulting action will be chosen randomly.
If you create your own policy, use these priorities as a guide for figuring out the priority of your policy. If your policy is a machine learning policy, it should most likely have priority 1, the same as the Rasa machine learning policies.
overriding policy priorities
All policy priorities are configurable via the
priority: parameter in the configuration,
but we do not recommend changing them outside of specific cases such as custom policies.
Doing so can lead to unexpected and undesired bot behavior.
Machine Learning Policies
The Transformer Embedding Dialogue (TED) Policy is described in our paper.
This policy has a pre-defined architecture, which comprises the following steps:
concatenate user input (user intent and entities), previous system actions, slots and active forms for each time step into an input vector to pre-transformer embedding layer;
feed it to transformer;
apply a dense layer to the output of the transformer to get embeddings of a dialogue for each time step;
apply a dense layer to create embeddings for system actions for each time step;
calculate the similarity between the dialogue embedding and embedded system actions. This step is based on the StarSpace idea.
It is recommended to use
(see Featurization of Conversations for details).
Configuration parameters can be passed as parameters to the
TEDPolicy within the configuration file.
If you want to adapt your model, start by modifying the following parameters:
epochs: This parameter sets the number of times the algorithm will see the training data (default:
epochis equals to one forward pass and one backward pass of all the training examples. Sometimes the model needs more epochs to properly learn. Sometimes more epochs don't influence the performance. The lower the number of epochs the faster the model is trained.
Pass an appropriate number, for example 50, of
epochs to the
TEDPolicy, otherwise the policy will
be trained only for
hidden_layers_sizes: This parameter allows you to define the number of feed forward layers and their output dimensions for dialogues and intents (default:
dialogue: , label: ). Every entry in the list corresponds to a feed forward layer. For example, if you set
dialogue: [256, 128], we will add two feed forward layers in front of the transformer. The vectors of the input tokens (coming from the dialogue) will be passed on to those layers. The first layer will have an output dimension of 256 and the second layer will have an output dimension of 128. If an empty list is used (default behavior), no feed forward layer will be added. Make sure to use only positive integer values. Usually, numbers of power of two are used. Also, it is usual practice to have decreasing values in the list: next value is smaller or equal to the value before.
number_of_transformer_layers: This parameter sets the number of transformer layers to use (default:
1). The number of transformer layers corresponds to the transformer blocks to use for the model.
transformer_size: This parameter sets the number of units in the transformer (default:
128). The vectors coming out of the transformers will have the given
weight_sparsity: This parameter defines the fraction of kernel weights that are set to 0 for all feed forward layers in the model (default:
0.8). The value should be between 0 and 1. If you set
weight_sparsityto 0, no kernel weights will be set to 0, the layer acts as a standard feed forward layer. You should not set
weight_sparsityto 1 as this would result in all kernel weights being 0, i.e. the model is not able to learn.
speeding up training
max_history for this policy is
None, which means it'll use the
FullDialogueTrackerFeaturizer. We recommend to set
max_history to a finite value in order to
MaxHistoryTrackerFeaturizer for faster training.
See Featurization of Conversations for
details. We recommend to increase
"batch_size": [32, 64])
The above configuration parameters are the ones you should configure to fit your model to your data. However, additional parameters exist that can be adapted.
More configurable parameters
maximum_negative_similarity is set to a negative value to mimic the original
starspace algorithm in the case
maximum_negative_similarity = maximum_positive_similarity and
use_maximum_negative_similarity = False. See starspace paper
MemoizationPolicy remembers the stories from your
training data. It checks if the current conversation matches a story
in the training data. If so, it will predict the next action from the matching
story of your training data with a confidence of
1.0. If no matching conversation
is found, the policy predicts
None with confidence
When looking for a match in your training data, the policy will take the last
max_history number of turns of the conversation into account.
One “turn” includes the message sent by the user and any actions the
assistant performed before waiting for the next message.
You can configure the number of turns the
MemoizationPolicy should use in your
Augmented Memoization Policy
AugmentedMemoizationPolicy remembers examples from training
stories for up to
max_history turns, just like the
Additionally, it has a forgetting mechanism that will forget a certain amount
of steps in the conversation history and try to find a match in your stories
with the reduced history. It predicts the next action with confidence
if a match is found, otherwise it predicts
None with confidence
If you have dialogues where some slots that are set during prediction time might not be set in training stories (e.g. in training stories starting with a reminder not all previous slots are set), make sure to add the relevant stories without slots to your training data as well.
RulePolicy is a policy that handles conversation parts that follow
a fixed behavior. It makes predictions based on any
rules you have in your
training data. See the Rules documentation for further information
on how to define rules.
The RulePolicy has the following configuration options:
0.3): Please see the fallback documentation for further information.
action_default_fallback): Please see the fallback documentation for further information.
true): Please see the fallback documentation for further information.
true): After training, the RulePolicy will perform a check to make sure that there are no rules that contradict with each other, or with any stories. The following snippet is an example for two contradicting rules:rules:- rule: Chitchatsteps:- intent: chitchat- action: utter_chitchat- rule: Greet instead of chitchatsteps:- intent: chitchat- action: utter_greet # `utter_greet` contradicts `utter_chitchat` from the rule above
true): Rules are restricted to one user turn, but there can be multiple bot events, including e.g. a form being filled and its Change this parameter to
falseat your own risk. Overusing rules for purposes outside of the recommended use cases will make it very hard to maintain your assistant as the complexity grows.
One important hyperparameter for Rasa Core policies is the
This controls how much dialogue history the model looks at to decide which
action to take next.
You can set the
max_history by passing it to your policy
in the policy configuration yaml file.
The default value is
None, which means that the complete dialogue history since session
restart is taken in the account.
RulePolicy doesn't have max history parameter, it always consider the full length
of provided rules. Please see Rules for further information.
As an example, let's say you have an
out_of_scope intent which
describes off-topic user messages. If your bot sees this intent multiple
times in a row, you might want to tell the user what you can help them
with. So your story might look like this:
For Rasa Core to learn this pattern, the
has to be at least 4.
If you increase your
max_history, your model will become bigger and
training will take longer. If you have some information that should
affect the dialogue very far into the future, you should store it as a
slot. Slot information is always available for every featurizer.
When you train a model, by default Rasa Core will create longer stories by randomly gluing together the ones in your stories files. This is because if you have stories like:
You actually want to teach your policy to ignore the dialogue history when it isn't relevant and just respond with the same action no matter what happened before.
You can alter this behavior with the
Which allows you to set the
augmentation_factor determines how many augmented stories are
subsampled during training. The augmented stories are subsampled before training
since their number can quickly become very large, and we want to limit it.
The number of sampled stories is
By default augmentation is set to 20, resulting in a maximum of 200 augmented stories.
--augmentation 0 disables all augmentation behavior.
The memoization based policies are not affected by augmentation
(independent of the
augmentation_factor) and will automatically
ignore all augmented stories.
In order to apply machine learning algorithms to conversational AI, we need to build up vector representations of conversations.
Each story corresponds to a tracker which consists of the states of the conversation just before each action was taken.
Every event in a trackers history creates a new state (e.g. running a bot action, receiving a user message, setting slots). Featurizing a single state of the tracker has a couple steps:
Tracker provides a bag of active features:
features indicating intents and entities, if this is the first state in a turn, e.g. it's the first action we will take after parsing the user's message. (e.g.
features indicating which slots are currently defined, e.g.
slot_locationif the user previously mentioned the area they're searching for restaurants.
features indicating the results of any API calls stored in slots, e.g.
features indicating what the last bot action or bot utterance was (e.g.
features indicating if any loop is active and which one
Convert all the features into numeric vectors:
SingleStateFeaturizeruses the Rasa NLU pipeline to convert the intent and bot action names or bot utterances into numeric vectors. Please see NLU Model Configuration for the details on how to configure Rasa NLU pipeline.
Entities, slots and active loops are featurized as one-hot encodings to indicate their presence.
If the domain defines the possible
4 additional default actions are added:
0 indicates default action listen, label
default restart, label
2 a greeting and
3 indicates goodbye.
It's often useful to include a bit more history than just the current state
when predicting an action. The
TrackerFeaturizer iterates over tracker
states and calls a
SingleStateFeaturizer for each state to create numeric
input features for a policy.
The target labels correspond to bot actions or bot utterances
represented as index in a list of all possible actions.
There are two different tracker featurizers:
1. Full Dialogue
FullDialogueTrackerFeaturizer creates numerical representation of
stories to feed to a recurrent neural network where the whole dialogue
is fed to a network and the gradient is backpropagated from all time steps.
The smaller dialogues are padded with
0 for all features.
2. Max History
MaxHistoryTrackerFeaturizer creates an array of previous tracker
states for each bot action or bot utterance, with the parameter
max_history defining how many states go into each row of input features.
max_history is not specified, the algorithm takes
the whole length of a dialogue into account.
The smaller dialogues are padded with
0 for all features.
Deduplication is performed to filter out duplicated turns (bot actions
or bot utterances) in terms of their previous states.
For some algorithms a flat feature vector is needed, so input features
should be reshaped to
(num_unique_turns, max_history * num_input_features).
You can also write custom policies and reference them in your configuration. In the example below, the last two lines show how to use a custom policy class and pass arguments to it.