Policies
Your assistant uses policies to decide which action to take at each step in a conversation. There are machine-learning and rule-based policies that your assistant can use in tandem.
You can customize the policies your assistant uses by specifying the policies
key in your project's config.yml
.
There are different policies to choose from, and you can include
multiple policies in a single configuration. Here's an example of
what a list of policies might look like:
Starting from scratch?
If you don't know which policies to choose, leave out the policies
key from your config.yml
completely.
If you do, the Suggested Config
feature will provide default policies for you.
Action Selection
At every turn, each policy defined in your configuration will predict a next action with a certain confidence level. For more information about how each policy makes its decision, read into the policy's description below. The policy that predicts with the highest confidence decides the assistant's next action.
Maximum number of predictions
By default, your assistant can predict a maximum of 10 next actions
after each user message. To update this value,
you can set the environment variable MAX_NUMBER_OF_PREDICTIONS
to the desired number of maximum predictions.
Policy Priority
In the case that two policies predict with equal confidence (for example, the Memoization and Rule Policies might both predict with confidence 1), the priority of the policies is considered. Rasa policies have default priorities that are set to ensure the expected outcome in the case of a tie. They look like this, where higher numbers have higher priority:
6 -
RulePolicy
3 -
MemoizationPolicy
orAugmentedMemoizationPolicy
2 -
UnexpecTEDIntentPolicy
1 -
TEDPolicy
In general, it is not recommended to have more than one policy per priority level in your configuration. If you have 2 policies with the same priority and they predict with the same confidence, the resulting action will be chosen randomly.
If you create your own policy, use these priorities as a guide for figuring out the priority of your policy.
If your policy is a machine learning policy, it should most likely have priority 1, the same as the TEDPolicy
.
overriding policy priorities
All policy priorities are configurable via the priority
parameter in the policy's configuration,
but we do not recommend changing them outside of specific cases such as custom policies.
Doing so can lead to unexpected and undesired bot behavior.
Machine Learning Policies
TED Policy
The Transformer Embedding Dialogue (TED) Policy is a multi-task architecture for next action prediction and entity recognition. The architecture consists of several transformer encoders which are shared for both tasks. A sequence of entity labels is predicted through a Conditional Random Field (CRF) tagging layer on top of the user sequence transformer encoder output corresponding to the input sequence of tokens. For the next action prediction, the dialogue transformer encoder output and the system action labels are embedded into a single semantic vector space. We use the dot-product loss to maximize the similarity with the target label and minimize similarities with negative samples.
If you want to learn more about the model, check out our paper and on our youtube channel. where we explain the model architecture in detail.
TED Policy architecture comprises the following steps:
Concatenate features for
- user input (user intent and entities) or user text processed through a user sequence transformer encoder,
- previous system actions or bot utterances processed through a bot sequence transformer encoder,
- slots and active forms
for each time step into an input vector to the embedding layer that precedes the dialogue transformer.
Feed the embedding of the input vector into the dialogue transformer encoder.
Apply a dense layer to the output of the dialogue transformer to get embeddings of the dialogue for each time step.
Apply a dense layer to create embeddings for system actions for each time step.
Calculate the similarity between the dialogue embedding and embedded system actions. This step is based on the StarSpace idea.
Concatenate the token-level output of the user sequence transformer encoder with the output of the dialogue transformer encoder for each time step.
Apply CRF algorithm to predict contextual entities for each user text input.
Configuration:
You can pass configuration parameters to the TEDPolicy
using the config.yml
file.
If you want to fine-tune your model, start by modifying the following parameters:
epochs
: This parameter sets the number of times the algorithm will see the training data (default:1
). Oneepoch
is equals to one forward pass and one backward pass of all the training examples. Sometimes the model needs more epochs to properly learn. Sometimes more epochs don't influence the performance. The lower the number of epochs the faster the model is trained. Here is how the config would look like:config.ymlpolicies:- name: TEDPolicyepochs: 200max_history
: This parameter controls how much dialogue history the model looks at to decide which action to take next. Defaultmax_history
for this policy isNone
, which means that the complete dialogue history since session restart is taken into account. If you want to limit the model to only see a certain number of previous dialogue turns, you can setmax_history
to a finite value. Please note that you should pickmax_history
carefully, so that the model has enough previous dialogue turns to create a correct prediction. See Featurizers for more details. Here is how the config would look like:config.ymlpolicies:- name: TEDPolicymax_history: 8number_of_transformer_layers
: This parameter sets the number of sequence transformer encoder layers to use for sequential transformer encoders for user, action and action label texts and for dialogue transformer encoder. (defaults:text: 1, action_text: 1, label_action_text: 1, dialogue: 1
). The number of sequence transformer encoder layers corresponds to the transformer blocks to use for the model.transformer_size
: This parameter sets the number of units in the sequence transformer encoder layers to use for sequential transformer encoders for user, action and action label texts and for dialogue transformer encoder. (defaults:text: 128, action_text: 128, label_action_text: 128, dialogue: 128
). The vectors coming out of the transformer encoders will have the giventransformer_size
.connection_density
: This parameter defines the fraction of kernel weights that are set to non zero values for all feed forward layers in the model (default:0.2
). The value should be between 0 and 1. If you setconnection_density
to 1, no kernel weights will be set to 0, the layer acts as a standard feed forward layer. You should not setconnection_density
to 0 as this would result in all kernel weights being 0, i.e. the model is not able to learn.split_entities_by_comma
: This parameter defines whether adjacent entities separated by a comma should be treated as one, or split. For example, entities with the typeingredients
, like "apple, banana" can be split into "apple" and "banana". An entity with typeaddress
, like "Schönhauser Allee 175, 10119 Berlin" should be treated as one.Can either be
True
/False
globally:config.ymlpolicies:- name: TEDPolicysplit_entities_by_comma: Trueor set per entity type, such as:
config.ymlpolicies:- name: TEDPolicysplit_entities_by_comma:address: Falseingredients: Trueconstrain_similarities
: This parameter when set toTrue
applies a sigmoid cross entropy loss over all similarity terms. This helps in keeping similarities between input and negative labels to smaller values. This should help in better generalization of the model to real world test sets.model_confidence
: This parameter allows the user to configure how confidences are computed during inference. Currently, only one value is supported:softmax
: Confidences are in the range[0, 1]
(old behavior and current default). Computed similarities are normalized with thesoftmax
activation function.
use_gpu
: This parameter defines whether a GPU (if available) will be used training. By default,TEDPolicy
will be trained on GPU if a GPU is available (i.e.use_gpu
isTrue
). To enforce thatTEDPolicy
uses only the CPU for training, setuse_gpu
toFalse
.
The above configuration parameters are the ones you should configure to fit your model to your data. However, additional parameters exist that can be adapted.
More configurable parameters
note
The parameter maximum_negative_similarity
is set to a negative value to mimic the original
starspace algorithm in the case maximum_negative_similarity = maximum_positive_similarity
and
use_maximum_negative_similarity = False
. See starspace paper
for details.
note
In addition to the config parameters above, TEDPolicy
prediction performance and
training time are affected by the --augmentation
argument of the rasa train
command. For more information see
Data Augmentation.
UnexpecTED Intent Policy
New in 2.8
This feature is experimental. We introduce experimental features to get feedback from our community, so we encourage you to try it out! However, the functionality might be changed or removed in the future. If you have feedback (positive or negative) please share it with us on the Rasa Forum.
UnexpecTEDIntentPolicy
helps you review conversations and also allows your bot to react
to unlikely user turns. It is an auxiliary policy that should only be used in
conjunction with at least one other policy, as the only action that it can trigger
is the special action_unlikely_intent
action.
UnexpecTEDIntentPolicy
has the same model architecture as TEDPolicy
.
The difference is at a task level. Instead of learning the best action to be triggered next,
UnexpecTEDIntentPolicy
learns the set of intents that are most likely to be expressed by the user
given the conversation context from training stories. It uses the learned information at inference time by
checking if the predicted intent by NLU is the most likely intent. If the intent predicted
by NLU is indeed likely to occur given the conversation context, UnexpecTEDIntentPolicy
does not trigger
any action. Otherwise, it triggers an action_unlikely_intent
with a confidence of 1.00
.
UnexpecTEDIntentPolicy
should be viewed as an aid for TEDPolicy
. Since, TEDPolicy
is expected to improve
with better coverage of unique conversation paths that the assistant is expected to handle in the training data,
UnexpecTEDIntentPolicy
helps to surface these unique conversation paths from past conversations. For example, if you had
the following story in your training data:
but an actual conversation might encounter interjections inside the form which you haven't accounted for:
As soon as the deny
intent gets triggered, the policy handling the form will keep requesting for the cuisine
slot
to be filled, as the training stories don't say that this case should be treated differently.
To help you identify that a special story that handles the user's deny
intent might be missing at this point,
UnexpecTEDIntentPolicy
can trigger an action_unlikely_intent
action right after deny
intent.
Subsequently, you can improve your assistant by adding a new training story that handles this particular case.
To reduce false warnings, UnexpecTEDIntentPolicy
has two mechanisms in place at inference time:
UnexpecTEDIntentPolicy
's priority is intentionally kept lower than all rule based policies since rules may exist for situations that are novel forTEDPolicy
orUnexpecTEDIntentPolicy
.UnexpecTEDIntentPolicy
does not predict anaction_unlikely_intent
if the last predicted intent isn't present in any of the training stories, which might happen if an intent is only used in rules.
action_unlikely_intent
Prediction of UnexpecTEDIntentPolicy
is invoked immediately after a user utterance and can either
trigger action_unlikely_intent
or abstain (in which case other policies will predict actions).
To determine if action_unlikely_intent
should be triggered, UnexpecTEDIntentPolicy
computes a score
for the user's intent in the current dialogue context and checks if this score is below a
certain threshold score.
This threshold score is computed by collecting the ML model's output on many "negative examples".
These negative examples are combinations of dialogue contexts and user
intents that are incorrect. UnexpecTEDIntentPolicy
generates these negative examples from your
training data by picking a random story part and pairing it with a random intent that doesn't
occur at this point. For example, if you had just one training story:
and an intent affirm
, then a valid negative example will be:
Here, affirm
intent is unexpected as it doesn't occur in this particular conversation context across all training stories.
For each intent, UnexpecTEDIntentPolicy
uses these negative examples to figure out the range of scores the model
predicts. The threshold score is picked from this range of scores in such a way that the predicted score for a
certain percentage of negative examples is higher than the threshold score and hence action_unlikely_intent
is not triggered for them. This percentage of negative examples can be controlled by the tolerance
parameter.
The higher the tolerance
, the lower the intent's score (the more unlikely the intent) needs to be
before UnexpecTEDIntentPolicy
triggers the action_unlikely_intent
action.
Configuration:
You can pass configuration parameters to the UnexpecTEDIntentPolicy
using the config.yml
file.
If you want to fine-tune model's performance, start by modifying the following parameters:
epochs
: This parameter sets the number of times the algorithm will see the training data (default:1
). Oneepoch
is equals to one forward pass and one backward pass of all the training examples. Sometimes the model needs more epochs to learn properly. Sometimes more epochs don't influence the performance. The lower the number of epochs the faster the model is trained. Here is how the config would look like:config.ymlpolicies:- name: UnexpecTEDIntentPolicyepochs: 200max_history
: This parameter controls how much dialogue history the model looks at before making an inference. Defaultmax_history
for this policy isNone
, which means that the complete dialogue history since session (re)start is taken into account. If you want to limit the model to only see a certain number of previous dialogue turns, you can setmax_history
to a finite value. Please note that you should pickmax_history
carefully, so that the model has enough previous dialogue turns to create a correct prediction. Depending on your dataset, higher values ofmax_history
can result in more frequent prediction ofaction_unlikely_intent
as the number of unique possible conversation paths increases as more dialogue context is taken into account. Similarly, lowering the value ofmax_history
can result inaction_unlikely_intent
being triggered less often but can also be a stronger indicator that the corresponding conversation path is highly unique and hence unexpected. We recommend you to set themax_history
ofUnexpecTEDIntentPolicy
equal to that ofTEDPolicy
. Here is how the config would look like:config.ymlpolicies:- name: UnexpecTEDIntentPolicymax_history: 8ignore_intents_list
: This parameter lets you configureUnexpecTEDIntentPolicy
to not predictaction_unlikely_intent
for a subset of intents. You might want to do this if you come across a certain list of intents for which there are too many false warnings generated.tolerance
: Thetolerance
parameter is a number that ranges from0.0
to1.0
(inclusive). It helps to adjust the threshold score used during prediction ofaction_unlikely_intent
at inference time.Here,
0.0
means that the threshold score will be adjusted in such a way that0%
of negative examples encountered during training are predicted with a score lower than the threshold score. Hence, conversation contexts from all negative examples will trigger anaction_unlikely_intent
action.A tolerance of
0.1
means that the threshold score will be adjusted in a way such that 10% of negative examples encountered during training are predicted with a score lower than the threshold score.A tolerance of
1.0
means that the threshold score is so low thatUnexpecTEDIntentPolicy
would not triggeraction_unlikely_intent
for any of the negative examples that it has encountered during training.use_gpu
: This parameter defines whether a GPU (if available) will be used training. By default,UnexpecTEDIntentPolicy
will be trained on GPU if a GPU is available (i.e.use_gpu
isTrue
). To enforce thatUnexpecTEDIntentPolicy
uses only the CPU for training, setuse_gpu
toFalse
.
The above configuration parameters are the ones you should try tweaking according to your use case and training data. However, additional parameters exist that you could adapt.
More configurable parameters
Tuning the tolerance parameter
When reviewing real conversations, we encourage you
to tune the tolerance
parameter in UnexpecTEDIntentPolicy
's configuration to reduce the number
of false warnings (intents that actually are likely given the conversation context).
As you increase the value of tolerance
from 0
to 1
in steps of 0.05
,
the number of false warnings should decrease. However, increasing the tolerance
will
also result in fewer triggers of action_unlikely_intent
and hence more conversation
paths not present in training stories will be missing in the set of flagged conversations.
If you change the max_history
value and retrain a model, you might have to re-adjust the tolerance
value as well.
Memoization Policy
The MemoizationPolicy
remembers the stories from your
training data. It checks if the current conversation matches the stories in your
stories.yml
file. If so, it will predict the next action from the matching
stories of your training data with a confidence of 1.0
. If no matching conversation
is found, the policy predicts None
with confidence 0.0
.
When looking for a match in your training data, the policy will take the last
max_history
number of turns of the conversation into account.
One “turn” includes the message sent by the user and any actions the
assistant performed before waiting for the next message.
You can configure the number of turns the MemoizationPolicy
should use in your
configuration:
Augmented Memoization Policy
The AugmentedMemoizationPolicy
remembers examples from training
stories for up to max_history
turns, just like the MemoizationPolicy
.
Additionally, it has a forgetting mechanism that will forget a certain amount
of steps in the conversation history and try to find a match in your stories
with the reduced history. It predicts the next action with confidence 1.0
if a match is found, otherwise it predicts None
with confidence 0.0
.
Slots and predictions
If you have dialogues where some slots that are set during prediction time might not be set in training stories (e.g. in training stories starting with a reminder, not all previous slots are set), make sure to add the relevant stories without slots to your training data as well.
Rule-based Policies
Rule Policy
The RulePolicy
is a policy that handles conversation parts that follow
a fixed behavior (e.g. business logic). It makes predictions based on
any rules
you have in your training data. See the
Rules documentation for further information on how to define rules.
The RulePolicy
has the following configuration options:
core_fallback_threshold
(default:0.3
): Please see the fallback documentation for further information.core_fallback_action_name
(default:action_default_fallback
): Please see the fallback documentation for further information.enable_fallback_prediction
(default:true
): Please see the fallback documentation for further information.check_for_contradictions
(default:true
): Before training, the RulePolicy will perform a check to make sure that slots and active loops set by actions are defined consistently for all rules. The following snippet contains an example of an incomplete rule:rules:- rule: complete rulesteps:- intent: search_venues- action: action_search_venues- slot_was_set:- venues: [{"name": "Big Arena", "reviews": 4.5}]- rule: incomplete rulesteps:- intent: search_venues- action: action_search_venuesIn the second
incomplete rule
,action_search_venues
should set thevenues
slot because it is set incomplete rule
, but this event is missing. There are several possible ways to fix this rule.In the case when
action_search_venues
can't find a venue and thevenues
slot should not be set, you should explicitly set the value of the slot tonull
. In the following storyRulePolicy
will predictutter_venues_not_found
only if the slotvenues
is not set:rules:- rule: fixes incomplete rulesteps:- intent: search_venues- action: action_search_venues- slot_was_set:- venues: null- action: utter_venues_not_foundIf you want the slot setting to be handled by a different rule or story, you should add
wait_for_user_input: false
to the end of the rule snippet:rules:- rule: incomplete rulesteps:- intent: search_venues- action: action_search_venueswait_for_user_input: falseAfter training, the RulePolicy will check that none of the rules or stories contradict each other. The following snippet is an example of two contradicting rules:
rules:- rule: Chitchatsteps:- intent: chitchat- action: utter_chitchat- rule: Greet instead of chitchatsteps:- intent: chitchat- action: utter_greet # `utter_greet` contradicts `utter_chitchat` from the rule aboverestrict_rules
(default:true
): Rules are restricted to one user turn, but there can be multiple bot events, including e.g. a form being filled and its subsequent submission. Changing this parameter tofalse
may result in unexpected behavior.Overusing rules
Overusing rules for purposes outside of the recommended use cases will make it very hard to maintain your assistant as the complexity grows.
Configuring Policies
Max History
One important hyperparameter for Rasa policies is the max_history
.
This controls how much dialogue history the model looks at to decide which
action to take next.
You can set the max_history
by passing it to your policy
in the policy configuration in your config.yml
.
The default value is None
, which means that the complete dialogue history since session
restart is taken in the account.
note
RulePolicy
doesn't have max history parameter, it always consider the full length
of provided rules. Please see Rules for further information.
As an example, let's say you have an out_of_scope
intent which
describes off-topic user messages. If your bot sees this intent multiple
times in a row, you might want to tell the user what you can help them
with. So your story might look like this:
For your model to learn this pattern, the max_history
has to be at least 4.
If you increase your max_history
, your model will become bigger and
training will take longer. If you have some information that should
affect the dialogue very far into the future, you should store it as a
slot. Slot information is always available for every featurizer.
Data Augmentation
When you train a model, Rasa will create longer stories by randomly combining the ones in your stories files. Take the stories below as an example:
You actually want to teach your policy to ignore the dialogue history
when it isn't relevant and to respond with the same action no matter
what happened before. To achieve this, individual stories are
concatenated into longer stories. From the example above, data augmentation
might produce a story by combining thank
with say goodbye
and then thank
again,
equivalent to:
You can alter this behavior with the --augmentation
flag,
which allows you to set the augmentation_factor
.
The augmentation_factor
determines how many augmented stories are
subsampled during training. The augmented stories are subsampled before training
since their number can quickly become very large, and you want to limit it.
The number of sampled stories is augmentation_factor
x10.
By default augmentation_factor
is set to 50, resulting in a maximum of 500 augmented stories.
--augmentation 0
disables all augmentation behavior. TEDPolicy
is the only policy
affected by augmentation. Other policies like MemoizationPolicy
or RulePolicy
automatically ignore all augmented stories (regardless of the augmentation_factor
).
--augmentation
is an important parameter when trying to reduce TEDPolicy
training
time. Reducing the augmentation_factor
decreases the size of the training data
and subsequently the time to train the policy. However, reducing the amount of data
augmentation can also reduce the performance of TEDPolicy
. We recommend using
a memoization based policy along with TEDPolicy
when reducing the amount of data
augmentation to compensate.
Featurizers
In order to apply machine learning algorithms to conversational AI, you need to build up vector representations of conversations.
Each story corresponds to a tracker which consists of the states of the conversation just before each action was taken.
State Featurizers
Every event in a trackers history creates a new state (e.g. running a bot action, receiving a user message, setting slots). Featurizing a single state of the tracker has two steps:
Tracker provides a bag of active features:
features indicating intents and entities, if this is the first state in a turn, e.g. it's the first action we will take after parsing the user's message. (e.g.
[intent_restaurant_search, entity_cuisine]
)features indicating which slots are currently defined, e.g.
slot_location
if the user previously mentioned the area they're searching for restaurants.features indicating the results of any API calls stored in slots, e.g.
slot_matches
features indicating what the last bot action or bot utterance was (e.g.
prev_action_listen
)features indicating if any loop is active and which one
Convert all the features into numeric vectors:
SingleStateFeaturizer
uses the Rasa NLU pipeline to convert the intent and bot action names or bot utterances into numeric vectors. See the NLU Model Configuration documentation for the details on how to configure Rasa NLU pipeline.Entities, slots and active loops are featurized as one-hot encodings to indicate their presence.
note
If the domain defines the possible actions
,
[ActionGreet, ActionGoodbye]
,
4 additional default actions are added:
[ActionListen(), ActionRestart(),
ActionDefaultFallback(), ActionDeactivateForm()]
.
Therefore, label 0
indicates default action listen, label 1
default restart, label 2
a greeting and 3
indicates goodbye.
Tracker Featurizers
A policy can be trained to learn two kinds of labels -
- The next most appropriate action to be triggered by the assistant. For example,
TEDPolicy
is trained to do this. - The next most likely intent that a user can express. For example,
UnexpecTEDIntentPolicy
is trained to learn this.
Hence, a tracker can be featurized to learn one of the labels mentioned above. Depending on the policy, the target labels correspond to bot actions or bot utterances represented as an index in a list of all possible actions or set of intents represented as an index in a list of all possible intents.
Tracker Featurizers come in three different flavours:
1. Full Dialogue
FullDialogueTrackerFeaturizer
creates a numerical representation of
stories to feed to a recurrent neural network where the whole dialogue
is fed to a network and the gradient is backpropagated from all time steps.
The target label is the most appropriate bot action or bot utterance which should be triggered in the
context of the conversation.
The TrackerFeaturizer
iterates over tracker
states and calls a SingleStateFeaturizer
for each state to create numeric input features for a policy.
2. Max History
MaxHistoryTrackerFeaturizer
operates very similarly to FullDialogueTrackerFeaturizer
as
it creates an array of previous tracker states for each bot action or bot utterance but with the parameter
max_history
defining how many states go into each row of input features.
If max_history
is not specified, the algorithm takes
the whole length of a dialogue into account.
Deduplication is performed to filter out duplicated turns (bot actions
or bot utterances) in terms of their previous states.
For some algorithms a flat feature vector is needed, so input features
should be reshaped to (num_unique_turns, max_history * num_input_features)
.
3. Intent Max History
IntentMaxHistoryTrackerFeaturizer
inherits from MaxHistoryTrackerFeaturizer
. Since, it is used by
UnexpecTEDIntentPolicy
, the target labels that it creates are the intents that can be
expressed by a user in the context of a conversation tracker. Unlike
other tracker featurizers, there can be multiple target labels. Hence, it pads the
list of target labels with a constant value (-1
) on the right to return an equally sized list of target labels
for each input conversation tracker.
Just like MaxHistoryTrackerFeaturizer
, it also performs deduplication to
filter out duplicated turns. However, it yields one featurized tracker per correct intent
for the corresponding tracker. For example, if the correct labels for an input conversation tracker have the following
indices - [0, 2, 4]
, then the featurizer will yield three pairs of featurized trackers and target labels.
The featurized trackers will be identical to each other but the target labels in each pair will be
[0, 2, 4]
, [4, 0, 2]
, [2, 4, 0]
.
Custom Policies
New in 3.0
Rasa 3.0 unified the implementation of NLU components and policies. This requires changes to custom policies written for earlier versions of Rasa Open Source. Please see the migration guide for a step-by-step guide for the migration.
You can also write custom policies and reference them in your configuration. In the example below, the last two lines show how to use a custom policy class and pass arguments to it. See the guide on custom graph components for a complete guide on custom policies.