Policies¶
Configuring Policies¶
The rasa.core.policies.Policy
class decides which action to take
at every step in the conversation.
There are different policies to choose from, and you can include
multiple policies in a single rasa.core.agent.Agent
.
Note
Per default a maximum of 10 next actions can be predicted
by the agent after every user message. To update this value
you can set the environment variable MAX_NUMBER_OF_PREDICTIONS
to the desired number of maximum predictions.
Your project’s config.yml
file takes a policies
key
which you can use to customize the policies your assistant uses.
In the example below, the last two lines show how to use a custom
policy class and pass arguments to it.
policies:
- name: "KerasPolicy"
featurizer:
- name: MaxHistoryTrackerFeaturizer
max_history: 5
state_featurizer:
- name: BinarySingleStateFeaturizer
- name: "MemoizationPolicy"
max_history: 5
- name: "FallbackPolicy"
nlu_threshold: 0.4
core_threshold: 0.3
fallback_action_name: "my_fallback_action"
- name: "path.to.your.policy.class"
arg1: "..."
Max History¶
One important hyperparameter for Rasa Core policies is the max_history
.
This controls how much dialogue history the model looks at to decide which
action to take next.
You can set the max_history
by passing it to your policy’s Featurizer
in the policy configuration yaml file.
Note
Only the MaxHistoryTrackerFeaturizer
uses a max history,
whereas the FullDialogueTrackerFeaturizer
always looks at
the full conversation history. See Featurization for details.
As an example, let’s say you have an out_of_scope
intent which
describes off-topic user messages. If your bot sees this intent multiple
times in a row, you might want to tell the user what you can help them
with. So your story might look like this:
* out_of_scope
- utter_default
* out_of_scope
- utter_default
* out_of_scope
- utter_help_message
For Rasa Core to learn this pattern, the max_history
has to be at least 4.
If you increase your max_history
, your model will become bigger and
training will take longer. If you have some information that should
affect the dialogue very far into the future, you should store it as a
slot. Slot information is always available for every featurizer.
Data Augmentation¶
When you train a model, by default Rasa Core will create longer stories by randomly gluing together the ones in your stories files. This is because if you have stories like:
# thanks
* thankyou
- utter_youarewelcome
# bye
* goodbye
- utter_goodbye
You actually want to teach your policy to ignore the dialogue history when it isn’t relevant and just respond with the same action no matter what happened before.
You can alter this behaviour with the --augmentation
flag.
Which allows you to set the augmentation_factor
.
The augmentation_factor
determines how many augmented stories are
subsampled during training. The augmented stories are subsampled before training
since their number can quickly become very large, and we want to limit it.
The number of sampled stories is augmentation_factor
x10.
By default augmentation is set to 20, resulting in a maximum of 200 augmented stories.
--augmentation 0
disables all augmentation behavior.
The memoization based policies are not affected by augmentation
(independent of the augmentation_factor
) and will automatically
ignore all augmented stories.
Action Selection¶
At every turn, each policy defined in your configuration will predict a next action with a certain confidence level. For more information about how each policy makes its decision, read into the policy’s description below. The bot’s next action is then decided by the policy that predicts with the highest confidence.
In the case that two policies predict with equal confidence (for example, the Memoization and Mapping Policies always predict with confidence of either 0 or 1), the priority of the policies is considered. Rasa policies have default priorities that are set to ensure the expected outcome in the case of a tie. They look like this, where higher numbers have higher priority:
5.FormPolicy
4.FallbackPolicy
andTwoStageFallbackPolicy
3.MemoizationPolicy
andAugmentedMemoizationPolicy
2.MappingPolicy
1.EmbeddingPolicy
,KerasPolicy
, andSklearnPolicy
This priority hierarchy ensures that, for example, if there is an intent with a mapped action, but the NLU confidence is not
above the nlu_threshold
, the bot will still fall back. In general, it is not recommended to have more
than one policy per priority level, and some policies on the same priority level, such as the two
fallback policies, strictly cannot be used in tandem.
If you create your own policy, use these priorities as a guide for figuring out the priority of your policy. If your policy is a machine learning policy, it should most likely have priority 1, the same as the Rasa machine learning policies.
Warning
All policy priorities are configurable via the priority:
parameter in the configuration,
but we do not recommend changing them outside of specific cases such as custom policies.
Doing so can lead to unexpected and undesired bot behavior.
Keras Policy¶
The KerasPolicy
uses a neural network implemented in
Keras to select the next action.
The default architecture is based on an LSTM, but you can override the
KerasPolicy.model_architecture
method to implement your own architecture.
def model_architecture(
self, input_shape: Tuple[int, int], output_shape: Tuple[int, Optional[int]]
) -> tf.keras.models.Sequential:
"""Build a keras model and return a compiled model."""
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
Masking,
LSTM,
Dense,
TimeDistributed,
Activation,
)
# Build Model
model = Sequential()
# the shape of the y vector of the labels,
# determines which output from rnn will be used
# to calculate the loss
if len(output_shape) == 1:
# y is (num examples, num features) so
# only the last output from the rnn is used to
# calculate the loss
model.add(Masking(mask_value=-1, input_shape=input_shape))
model.add(LSTM(self.rnn_size, dropout=0.2))
model.add(Dense(input_dim=self.rnn_size, units=output_shape[-1]))
elif len(output_shape) == 2:
# y is (num examples, max_dialogue_len, num features) so
# all the outputs from the rnn are used to
# calculate the loss, therefore a sequence is returned and
# time distributed layer is used
# the first value in input_shape is max dialogue_len,
# it is set to None, to allow dynamic_rnn creation
# during prediction
model.add(Masking(mask_value=-1, input_shape=(None, input_shape[1])))
model.add(LSTM(self.rnn_size, return_sequences=True, dropout=0.2))
model.add(TimeDistributed(Dense(units=output_shape[-1])))
else:
raise ValueError(
"Cannot construct the model because"
"length of output_shape = {} "
"should be 1 or 2."
"".format(len(output_shape))
)
model.add(Activation("softmax"))
model.compile(
loss="categorical_crossentropy", optimizer="rmsprop", metrics=["accuracy"]
)
if obtain_verbosity() > 0:
model.summary()
return model
and the training is run here:
def train(
self,
training_trackers: List[DialogueStateTracker],
domain: Domain,
**kwargs: Any,
) -> None:
# set numpy random seed
np.random.seed(self.random_seed)
training_data = self.featurize_for_training(training_trackers, domain, **kwargs)
# noinspection PyPep8Naming
shuffled_X, shuffled_y = training_data.shuffled_X_y()
self.graph = tf.Graph()
with self.graph.as_default():
# set random seed in tf
tf.set_random_seed(self.random_seed)
self.session = tf.compat.v1.Session(config=self._tf_config)
with self.session.as_default():
if self.model is None:
self.model = self.model_architecture(
shuffled_X.shape[1:], shuffled_y.shape[1:]
)
logger.info(
"Fitting model with {} total samples and a "
"validation split of {}"
"".format(training_data.num_examples(), self.validation_split)
)
# filter out kwargs that cannot be passed to fit
self._train_params = self._get_valid_params(
self.model.fit, **self._train_params
)
self.model.fit(
shuffled_X,
shuffled_y,
epochs=self.epochs,
batch_size=self.batch_size,
shuffle=False,
verbose=obtain_verbosity(),
**self._train_params,
)
# the default parameter for epochs in keras fit is 1
self.current_epoch = self.defaults.get("epochs", 1)
logger.info("Done fitting keras policy model")
You can implement the model of your choice by overriding these methods,
or initialize KerasPolicy
with pre-defined keras model
.
In order to get reproducible training results for the same inputs you can
set the random_seed
attribute of the KerasPolicy
to any integer.
Embedding Policy¶
Transformer Embedding Dialogue Policy (TEDP)
Transformer version of the Recurrent Embedding Dialogue Policy (REDP) used in our paper: https://arxiv.org/abs/1811.11707
This policy has a pre-defined architecture, which comprises the following steps:
- concatenate user input (user intent and entities), previous system action, slots and active form for each time step into an input vector to pre-transformer embedding layer;
- feed it to transformer;
- apply a dense layer to the output of the transformer to get embeddings of a dialogue for each time step;
- apply a dense layer to create embeddings for system actions for each time step;
- calculate the similarity between the dialogue embedding and embedded system actions. This step is based on the StarSpace idea.
It is recommended to use
state_featurizer=LabelTokenizerSingleStateFeaturizer(...)
(see Featurization for details).
Configuration:
Configuration parameters can be passed as parameters to the
EmbeddingPolicy
within the policy configuration file.Warning
Pass an appropriate number of
epochs
to theEmbeddingPolicy
, otherwise the policy will be trained only for1
epoch.The algorithm also has hyper-parameters to control:
neural network’s architecture:
hidden_layers_sizes_b
sets a list of hidden layers sizes before embedding layer for system actions, the number of hidden layers is equal to the length of the list;transformer_size
sets the number of units in the transfomer;num_transformer_layers
sets the number of transformer layers;pos_encoding
sets the type of positional encoding in transformer, it should be eithertiming
oremb
;max_seq_length
sets maximum sequence length if embedding positional encodings are used;num_heads
sets the number of heads in multihead attention;training:
batch_size
sets the number of training examples in one forward/backward pass, the higher the batch size, the more memory space you’ll need;batch_strategy
sets the type of batching strategy, it should be eithersequence
orbalanced
;epochs
sets the number of times the algorithm will see training data, where oneepoch
equals one forward pass and one backward pass of all the training examples;random_seed
if set to any int will get reproducible training results for the same inputs;embedding:
embed_dim
sets the dimension of embedding space;num_neg
sets the number of incorrect intent labels, the algorithm will minimize their similarity to the user input during training;similarity_type
sets the type of the similarity, it should be eitherauto
,cosine
orinner
, ifauto
, it will be set depending onloss_type
,inner
forsoftmax
,cosine
formargin
;loss_type
sets the type of the loss function, it should be eithersoftmax
ormargin
;mu_pos
controls how similar the algorithm should try to make embedding vectors for correct intent labels, used only ifloss_type
is set tomargin
;mu_neg
controls maximum negative similarity for incorrect intents, used only ifloss_type
is set tomargin
;use_max_sim_neg
iftrue
the algorithm only minimizes maximum similarity over incorrect intent labels, used only ifloss_type
is set tomargin
;scale_loss
iftrue
the algorithm will downscale the loss for examples where correct label is predicted with high confidence, used only ifloss_type
is set tosoftmax
;regularization:
C2
sets the scale of L2 regularizationC_emb
sets the scale of how important is to minimize the maximum similarity between embeddings of different intent labels, used only ifloss_type
is set tomargin
;droprate_a
sets the dropout rate between layers before embedding layer for user inputs;droprate_b
sets the dropout rate between layers before embedding layer for system actions;train accuracy calculation:
evaluate_every_num_epochs
sets how often to calculate train accuracy, small values may hurt performance;evaluate_on_num_examples
how many examples to use for hold out validation set to calculate of validation accuracy, large values may hurt performance.Warning
Default
max_history
for this policy isNone
which means it’ll use theFullDialogueTrackerFeaturizer
. We recommend to setmax_history
to some finite value in order to useMaxHistoryTrackerFeaturizer
for faster training. See Featurization for details. We recommend to increasebatch_size
forMaxHistoryTrackerFeaturizer
(e.g."batch_size": [32, 64]
)Warning
If
evaluate_on_num_examples
is non zero, random examples will be picked by stratified split and used as hold out validation set, so they will be excluded from training data. We suggest to set it to zero if data set contains a lot of unique examples of dialogue turnsNote
Droprate should be between
0
and1
, e.g.droprate=0.1
would drop out10%
of input units.Note
For
cosine
similaritymu_pos
andmu_neg
should be between-1
and1
.Note
There is an option to use linearly increasing batch size. The idea comes from https://arxiv.org/abs/1711.00489. In order to do it pass a list to
batch_size
, e.g."batch_size": [8, 32]
(default behaviour). If constantbatch_size
is required, pass anint
, e.g."batch_size": 8
.These parameters can be specified in the policy configuration file. The default values are defined in
EmbeddingPolicy.defaults
:defaults = { # nn architecture # a list of hidden layers sizes before user embed layer # number of hidden layers is equal to the length of this list "hidden_layers_sizes_pre_dial": [], # a list of hidden layers sizes before bot embed layer # number of hidden layers is equal to the length of this list "hidden_layers_sizes_bot": [], # number of units in transformer "transformer_size": 128, # number of transformer layers "num_transformer_layers": 1, # type of positional encoding in transformer "pos_encoding": "timing", # string 'timing' or 'emb' # max sequence length if pos_encoding='emb' "max_seq_length": 256, # number of attention heads in transformer "num_heads": 4, # training parameters # initial and final batch sizes: # batch size will be linearly increased for each epoch "batch_size": [8, 32], # how to create batches "batch_strategy": "balanced", # string 'sequence' or 'balanced' # number of epochs "epochs": 1, # set random seed to any int to get reproducible results "random_seed": None, # embedding parameters # dimension size of embedding vectors "embed_dim": 20, # the type of the similarity "num_neg": 20, # flag if minimize only maximum similarity over incorrect labels "similarity_type": "auto", # string 'auto' or 'cosine' or 'inner' # the type of the loss function "loss_type": "softmax", # string 'softmax' or 'margin' # how similar the algorithm should try # to make embedding vectors for correct labels "mu_pos": 0.8, # should be 0.0 < ... < 1.0 for 'cosine' # maximum negative similarity for incorrect labels "mu_neg": -0.2, # should be -1.0 < ... < 1.0 for 'cosine' # the number of incorrect labels, the algorithm will minimize # their similarity to the user input during training "use_max_sim_neg": True, # flag which loss function to use # scale loss inverse proportionally to confidence of correct prediction "scale_loss": True, # regularization # the scale of L2 regularization "C2": 0.001, # the scale of how important is to minimize the maximum similarity # between embeddings of different labels "C_emb": 0.8, # dropout rate for dial nn "droprate_a": 0.1, # dropout rate for bot nn "droprate_b": 0.0, # visualization of accuracy # how often calculate validation accuracy "evaluate_every_num_epochs": 20, # small values may hurt performance # how many examples to use for hold out validation set "evaluate_on_num_examples": 0, # large values may hurt performance }Note
Parameter
mu_neg
is set to a negative value to mimic the original starspace algorithm in the casemu_neg = mu_pos
anduse_max_sim_neg = False
. See starspace paper for details.
Mapping Policy¶
The MappingPolicy
can be used to directly map intents to actions. The
mappings are assigned by giving an intent the property triggers
, e.g.:
intents:
- ask_is_bot:
triggers: action_is_bot
An intent can only be mapped to at most one action. The bot will run the mapped action once it receives a message of the triggering intent. Afterwards, it will listen for the next message. With the next user message, normal prediction will resume.
If you do not want your intent-action mapping to affect the dialogue
history, the mapped action must return a UserUtteranceReverted()
event. This will delete the user’s latest message, along with any events that
happened after it, from the dialogue history. This means you should not
include the intent-action interaction in your stories.
For example, if a user asks “Are you a bot?” off-topic in the middle of the flow, you probably want to answer without that interaction affecting the next action prediction. A triggered custom action can do anything, but here’s a simple example that dispatches a bot utterance and then reverts the interaction:
class ActionIsBot(Action):
"""Revertible mapped action for utter_is_bot"""
def name(self):
return "action_is_bot"
def run(self, dispatcher, tracker, domain):
dispatcher.utter_template(template="utter_is_bot")
return [UserUtteranceReverted()]
Note
If you use the MappingPolicy
to predict bot utterances directly (e.g.
triggers: utter_{}
), these interactions must go in your stories, as in this
case there is no UserUtteranceReverted()
and the
intent and the mapped utterance will appear in the dialogue history.
Note
The MappingPolicy is also responsible for executing the default actions action_back
and action_restart
in response to /back
and /restart
. If it is not included
in your policy example these intents will not work.
Memoization Policy¶
The MemoizationPolicy
just memorizes the conversations in your
training data. It predicts the next action with confidence 1.0
if this exact conversation exists in the training data, otherwise it
predicts None
with confidence 0.0
.
Augmented Memoization Policy¶
The AugmentedMemoizationPolicy
remembers examples from training
stories for up to max_history
turns, just like the MemoizationPolicy
.
Additionally, it has a forgetting mechanism that will forget a certain amount
of steps in the conversation history and try to find a match in your stories
with the reduced history. It predicts the next action with confidence 1.0
if a match is found, otherwise it predicts None
with confidence 0.0
.
Note
If you have dialogues where some slots that are set during prediction time might not be set in training stories (e.g. in training stories starting with a reminder not all previous slots are set), make sure to add the relevant stories without slots to your training data as well.
Fallback Policy¶
The FallbackPolicy
invokes a fallback action if at least one of the following occurs:
- The intent recognition has a confidence below
nlu_threshold
. - The highest ranked intent differs in confidence with the second highest
ranked intent by less than
ambiguity_threshold
. - None of the dialogue policies predict an action with confidence higher than
core_threshold
.
Configuration:
The thresholds and fallback action can be adjusted in the policy configuration file as parameters of the
FallbackPolicy
:policies: - name: "FallbackPolicy" nlu_threshold: 0.3 ambiguity_threshold: 0.1 core_threshold: 0.3 fallback_action_name: 'action_default_fallback'
nlu_threshold
Min confidence needed to accept an NLU prediction ambiguity_threshold
Min amount by which the confidence of the top intent must exceed that of the second highest ranked intent. core_threshold
Min confidence needed to accept an action prediction from Rasa Core fallback_action_name
Name of the fallback action to be called if the confidence of intent or action is below the respective threshold You can also configure the
FallbackPolicy
in your python code:from rasa.core.policies.fallback import FallbackPolicy from rasa.core.policies.keras_policy import KerasPolicy from rasa.core.agent import Agent fallback = FallbackPolicy(fallback_action_name="action_default_fallback", core_threshold=0.3, nlu_threshold=0.3, ambiguity_threshold=0.1) agent = Agent("domain.yml", policies=[KerasPolicy(), fallback])Note
You can include either the
FallbackPolicy
or theTwoStageFallbackPolicy
in your configuration, but not both.
Two-Stage Fallback Policy¶
The TwoStageFallbackPolicy
handles low NLU confidence in multiple stages
by trying to disambiguate the user input.
If an NLU prediction has a low confidence score or is not significantly higher than the second highest ranked prediction, the user is asked to affirm the classification of the intent.
- If they affirm, the story continues as if the intent was classified with high confidence from the beginning.
- If they deny, the user is asked to rephrase their message.
Rephrasing
- If the classification of the rephrased intent was confident, the story continues as if the user had this intent from the beginning.
- If the rephrased intent was not classified with high confidence, the user is asked to affirm the classified intent.
Second affirmation
- If the user affirms the intent, the story continues as if the user had this intent from the beginning.
- If the user denies, the original intent is classified as the specified
deny_suggestion_intent_name
, and an ultimate fallback action is triggered (e.g. a handoff to a human).
Configuration:
To use the
TwoStageFallbackPolicy
, include the following in your policy configuration.policies: - name: TwoStageFallbackPolicy nlu_threshold: 0.3 ambiguity_threshold: 0.1 core_threshold: 0.3 fallback_core_action_name: "action_default_fallback" fallback_nlu_action_name: "action_default_fallback" deny_suggestion_intent_name: "out_of_scope"
nlu_threshold
Min confidence needed to accept an NLU prediction ambiguity_threshold
Min amount by which the confidence of the top intent must exceed that of the second highest ranked intent. core_threshold
Min confidence needed to accept an action prediction from Rasa Core fallback_core_action_name
Name of the fallback action to be called if the confidence of Rasa Core action prediction is below the core_threshold
. This action is to propose the recognized intentsfallback_nlu_action_name
Name of the fallback action to be called if the confidence of Rasa NLU intent classification is below the nlu_threshold
. This action is called when the user denies the second timedeny_suggestion_intent_name
The name of the intent which is used to detect that the user denies the suggested intents Note
You can include either the
FallbackPolicy
or theTwoStageFallbackPolicy
in your configuration, but not both.
Form Policy¶
The FormPolicy
is an extension of the MemoizationPolicy
which
handles the filling of forms. Once a FormAction
is called, the
FormPolicy
will continually predict the FormAction
until all required
slots in the form are filled. For more information, see Forms.