Warning: This document is for the development version of Rasa. The latest version is 1.2.3.

Policies

Data Augmentation

When you train a model, by default Rasa Core will create longer stories by randomly gluing together the ones in your stories files. This is because if you have stories like:

# thanks
* thankyou
   - utter_youarewelcome

# bye
* goodbye
   - utter_goodbye

You actually want to teach your policy to ignore the dialogue history when it isn’t relevant and just respond with the same action no matter what happened before.

You can alter this behaviour with the --augmentation flag. Which allows you to set the augmentation_factor. The augmentation_factor determines how many augmented stories are subsampled during training. Subsampling of the augmented stories is done in order to not get too many stories from augmentation, since their number can become very large quickly. The number of sampled stories is augmentation_factor x10. By default augmentation is set to 20, resulting in a maximum of 200 augmented stories.

--augmentation 0 disables all augmentation behavior. The memoization based policies are not affected by augmentation (independent of the augmentation_factor) and will automatically ignore all augmented stories.

Configuring Policies

The rasa.core.policies.Policy class decides which action to take at every step in the conversation.

There are different policies to choose from, and you can include multiple policies in a single rasa.core.agent.Agent. At every turn, the policy which predicts the next action with the highest confidence will be used. If two policies predict with equal confidence, the policy with the higher priority will be used.

Note

Per default a maximum of 10 next actions can be predicted by the agent after every user message. To update this value you can set the environment variable MAX_NUMBER_OF_PREDICTIONS to the desired number of maximum predictions.

Your project’s config.yml file takes a policies key which you can use to customize the policies your assistant uses. In the example below, the last two lines show how to use a custom policy class and pass arguments to it.

policies:
  - name: "KerasPolicy"
    featurizer:
    - name: MaxHistoryTrackerFeaturizer
      max_history: 5
      state_featurizer:
        - name: BinarySingleStateFeaturizer
  - name: "MemoizationPolicy"
    max_history: 5
  - name: "FallbackPolicy"
    nlu_threshold: 0.4
    core_threshold: 0.3
    fallback_action_name: "my_fallback_action"
  - name: "path.to.your.policy.class"
    arg1: "..."

Max History

One important hyperparameter for Rasa Core policies is the max_history. This controls how much dialogue history the model looks at to decide which action to take next.

You can set the max_history by passing it to your policy’s Featurizer in the policy configuration yaml file.

Note

Only the MaxHistoryTrackerFeaturizer uses a max history, whereas the FullDialogueTrackerFeaturizer always looks at the full conversation history. See Featurization for details.

As an example, let’s say you have an out_of_scope intent which describes off-topic user messages. If your bot sees this intent multiple times in a row, you might want to tell the user what you can help them with. So your story might look like this:

* out_of_scope
   - utter_default
* out_of_scope
   - utter_default
* out_of_scope
   - utter_help_message

For Rasa Core to learn this pattern, the max_history has to be at least 3.

If you increase your max_history, your model will become bigger and training will take longer. If you have some information that should affect the dialogue very far into the future, you should store it as a slot. Slot information is always available for every featurizer.

Keras Policy

The KerasPolicy uses a neural network implemented in Keras to select the next action. The default architecture is based on an LSTM, but you can override the KerasPolicy.model_architecture method to implement your own architecture.

def model_architecture(
    self, input_shape: Tuple[int, int], output_shape: Tuple[int, Optional[int]]
) -> tf.keras.models.Sequential:
    """Build a keras model and return a compiled model."""

    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import (
        Masking,
        LSTM,
        Dense,
        TimeDistributed,
        Activation,
    )

    # Build Model
    model = Sequential()

    # the shape of the y vector of the labels,
    # determines which output from rnn will be used
    # to calculate the loss
    if len(output_shape) == 1:
        # y is (num examples, num features) so
        # only the last output from the rnn is used to
        # calculate the loss
        model.add(Masking(mask_value=-1, input_shape=input_shape))
        model.add(LSTM(self.rnn_size, dropout=0.2))
        model.add(Dense(input_dim=self.rnn_size, units=output_shape[-1]))
    elif len(output_shape) == 2:
        # y is (num examples, max_dialogue_len, num features) so
        # all the outputs from the rnn are used to
        # calculate the loss, therefore a sequence is returned and
        # time distributed layer is used

        # the first value in input_shape is max dialogue_len,
        # it is set to None, to allow dynamic_rnn creation
        # during prediction
        model.add(Masking(mask_value=-1, input_shape=(None, input_shape[1])))
        model.add(LSTM(self.rnn_size, return_sequences=True, dropout=0.2))
        model.add(TimeDistributed(Dense(units=output_shape[-1])))
    else:
        raise ValueError(
            "Cannot construct the model because"
            "length of output_shape = {} "
            "should be 1 or 2."
            "".format(len(output_shape))
        )

    model.add(Activation("softmax"))

    model.compile(
        loss="categorical_crossentropy", optimizer="rmsprop", metrics=["accuracy"]
    )

    if obtain_verbosity() > 0:
        model.summary()

    return model

and the training is run here:

def train(
    self,
    training_trackers: List[DialogueStateTracker],
    domain: Domain,
    **kwargs: Any
) -> None:

    # set numpy random seed
    np.random.seed(self.random_seed)

    training_data = self.featurize_for_training(training_trackers, domain, **kwargs)
    # noinspection PyPep8Naming
    shuffled_X, shuffled_y = training_data.shuffled_X_y()

    self.graph = tf.Graph()
    with self.graph.as_default():
        # set random seed in tf
        tf.set_random_seed(self.random_seed)
        self.session = tf.Session(config=self._tf_config)

        with self.session.as_default():
            if self.model is None:
                self.model = self.model_architecture(
                    shuffled_X.shape[1:], shuffled_y.shape[1:]
                )

            logger.info(
                "Fitting model with {} total samples and a "
                "validation split of {}"
                "".format(training_data.num_examples(), self.validation_split)
            )

            # filter out kwargs that cannot be passed to fit
            self._train_params = self._get_valid_params(
                self.model.fit, **self._train_params
            )

            self.model.fit(
                shuffled_X,
                shuffled_y,
                epochs=self.epochs,
                batch_size=self.batch_size,
                shuffle=False,
                verbose=obtain_verbosity(),
                **self._train_params
            )
            # the default parameter for epochs in keras fit is 1
            self.current_epoch = self.defaults.get("epochs", 1)
            logger.info("Done fitting keras policy model")

You can implement the model of your choice by overriding these methods, or initialize KerasPolicy with pre-defined keras model.

In order to get reproducible training results for the same inputs you can set the random_seed attribute of the KerasPolicy to any integer.

Embedding Policy

The Recurrent Embedding Dialogue Policy (REDP) described in our paper: https://arxiv.org/abs/1811.11707

This policy has a pre-defined architecture, which comprises the following steps:

  • apply dense layers to create embeddings for user intents, entities and system actions including previous actions and slots;
  • use the embeddings of previous user inputs as a user memory and embeddings of previous system actions as a system memory;
  • concatenate user input, previous system action and slots embeddings for current time into an input vector to rnn;
  • using user and previous system action embeddings from the input vector, calculate attention probabilities over the user and system memories (for system memory, this policy uses NTM mechanism with attention by location);
  • sum the user embedding and user attention vector and feed it and the embeddings of the slots as an input to an LSTM cell;
  • apply a dense layer to the output of the LSTM to get a raw recurrent embedding of a dialogue;
  • sum this raw recurrent embedding of a dialogue with system attention vector to create dialogue level embedding, this step allows the algorithm to repeat previous system action by copying its embedding vector directly to the current time output;
  • weight previous LSTM states with system attention probabilities to get the previous action embedding, the policy is likely payed attention to;
  • if the similarity between this previous action embedding and current time dialogue embedding is high, overwrite current LSTM state with the one from the time when this action happened;
  • for each LSTM time step, calculate the similarity between the dialogue embedding and embedded system actions. This step is based on the StarSpace idea.

Note

This policy only works with FullDialogueTrackerFeaturizer(state_featurizer).

It is recommended to use state_featurizer=LabelTokenizerSingleStateFeaturizer(...) (see Featurization for details).

Configuration:

Configuration parameters can be passed as parameters to the EmbeddingPolicy within the policy configuration file.

Note

Pass an appropriate number of epochs to the EmbeddingPolicy, otherwise the policy will be trained only for 1 epoch. Since this is an embedding based policy, it requires a large number of epochs, which depends on the complexity of the training data and whether attention is used or not.

The main feature of this policy is an attention mechanism over previous user input and system actions. Attention is turned on by default; in order to turn it off, configure the following parameters:

  • attn_before_rnn if true the algorithm will use attention mechanism over previous user input, default true;
  • attn_after_rnn if true the algorithm will use attention mechanism over previous system actions and will be able to copy previously executed action together with LSTM’s hidden state from its history, default true;
  • sparse_attention if true sparsemax will be used instead of softmax for attention probabilities, default false;
  • attn_shift_range the range of allowed location-based attention shifts for system memory (attn_after_rnn), see https://arxiv.org/abs/1410.5401 for details;

Note

Attention requires larger values of epochs and takes longer to train. But it can learn more complicated and nonlinear behaviour.

The algorithm also has hyper-parameters to control:

  • neural network’s architecture:

    • hidden_layers_sizes_a sets a list of hidden layers sizes before embedding layer for user inputs, the number of hidden layers is equal to the length of the list;
    • hidden_layers_sizes_b sets a list of hidden layers sizes before embedding layer for system actions, the number of hidden layers is equal to the length of the list;
    • rnn_size sets the number of units in the LSTM cell;
  • training:

    • layer_norm if true layer normalization for lstm cell is turned on, default true;
    • batch_size sets the number of training examples in one forward/backward pass, the higher the batch size, the more memory space you’ll need;
    • epochs sets the number of times the algorithm will see training data, where one epoch equals one forward pass and one backward pass of all the training examples;
    • random_seed if set to any int will get reproducible training results for the same inputs;
  • embedding:

    • embed_dim sets the dimension of embedding space;
    • mu_pos controls how similar the algorithm should try to make embedding vectors for correct intent labels;
    • mu_neg controls maximum negative similarity for incorrect intents;
    • similarity_type sets the type of the similarity, it should be either cosine or inner;
    • num_neg sets the number of incorrect intent labels, the algorithm will minimize their similarity to the user input during training;
    • use_max_sim_neg if true the algorithm only minimizes maximum similarity over incorrect intent labels;
  • regularization:

    • C2 sets the scale of L2 regularization
    • C_emb sets the scale of how important is to minimize the maximum similarity between embeddings of different intent labels;
    • droprate_a sets the dropout rate between hidden layers before embedding layer for user inputs;
    • droprate_b sets the dropout rate between hidden layers before embedding layer for system actions;
    • droprate_rnn sets the recurrent dropout rate on the LSTM hidden state https://arxiv.org/abs/1603.05118;
  • train accuracy calculation:

    • evaluate_every_num_epochs sets how often to calculate train accuracy, small values may hurt performance;
    • evaluate_on_num_examples how many examples to use for calculation of train accuracy, large values may hurt performance.

Note

Droprate should be between 0 and 1, e.g. droprate=0.1 would drop out 10% of input units.

Note

For cosine similarity mu_pos and mu_neg should be between -1 and 1.

Note

There is an option to use linearly increasing batch size. The idea comes from https://arxiv.org/abs/1711.00489. In order to do it pass a list to batch_size, e.g. "batch_size": [8, 32] (default behaviour). If constant batch_size is required, pass an int, e.g. "batch_size": 8.

These parameters can be specified in the policy configuration file. The default values are defined in EmbeddingPolicy.defaults:

defaults = {
    # nn architecture
    # a list of hidden layers sizes before user embed layer
    # number of hidden layers is equal to the length of this list
    "hidden_layers_sizes_a": [],
    # a list of hidden layers sizes before bot embed layer
    # number of hidden layers is equal to the length of this list
    "hidden_layers_sizes_b": [],
    # number of units in rnn cell
    "rnn_size": 64,
    # training parameters
    # flag if to turn on layer normalization for lstm cell
    "layer_norm": True,
    # initial and final batch sizes - batch size will be
    # linearly increased for each epoch
    "batch_size": [8, 32],
    # number of epochs
    "epochs": 1,
    # set random seed to any int to get reproducible results
    "random_seed": None,
    # embedding parameters
    # dimension size of embedding vectors
    "embed_dim": 20,
    # how similar the algorithm should try
    # to make embedding vectors for correct actions
    "mu_pos": 0.8,  # should be 0.0 < ... < 1.0 for 'cosine'
    # maximum negative similarity for incorrect actions
    "mu_neg": -0.2,  # should be -1.0 < ... < 1.0 for 'cosine'
    # the type of the similarity
    "similarity_type": "cosine",  # string 'cosine' or 'inner'
    # the number of incorrect actions, the algorithm will minimize
    # their similarity to the user input during training
    "num_neg": 20,
    # flag if minimize only maximum similarity over incorrect actions
    "use_max_sim_neg": True,  # flag which loss function to use
    # regularization
    # the scale of L2 regularization
    "C2": 0.001,
    # the scale of how important is to minimize the maximum similarity
    # between embeddings of different actions
    "C_emb": 0.8,
    # scale loss with inverse frequency of bot actions
    "scale_loss_by_action_counts": True,
    # dropout rate for user nn
    "droprate_a": 0.0,
    # dropout rate for bot nn
    "droprate_b": 0.0,
    # dropout rate for rnn
    "droprate_rnn": 0.1,
    # attention parameters
    # flag to use attention over user input
    # as an input to rnn
    "attn_before_rnn": True,
    # flag to use attention over prev bot actions
    # and copy it to output bypassing rnn
    "attn_after_rnn": True,
    # flag to use `sparsemax` instead of `softmax` for attention
    "sparse_attention": False,  # flag to use sparsemax for probs
    # the range of allowed location-based attention shifts
    "attn_shift_range": None,  # if None, set to mean dialogue length / 2
    # visualization of accuracy
    # how often calculate train accuracy
    "evaluate_every_num_epochs": 20,  # small values may hurt performance
    # how many examples to use for calculation of train accuracy
    "evaluate_on_num_examples": 100,  # large values may hurt performance
}

Note

Parameter mu_neg is set to a negative value to mimic the original starspace algorithm in the case mu_neg = mu_pos and use_max_sim_neg = False. See starspace paper for details.

Memoization Policy

The MemoizationPolicy just memorizes the conversations in your training data. It predicts the next action with confidence 1.0 if this exact conversation exists in the training data, otherwise it predicts None with confidence 0.0.

Mapping Policy

The MappingPolicy can be used to directly map intents to actions. The mappings are assigned by giving an intent the property triggers, e.g.:

intents:
 - ask_is_bot:
     triggers: action_is_bot

An intent can only be mapped to at most one action. The bot will run the mapped action once it receives a message of the triggering intent. Afterwards, it will listen for the next message. With the next user message, normal prediction will resume.

If you do not want your intent-action mapping to affect the dialogue history, the mapped action must return a UserUtteranceReverted() event. This will delete the user’s latest message, along with any events that happened after it, from the dialogue history. This means you should not include the intent-action interaction in your stories.

For example, if a user asks “Are you a bot?” off-topic in the middle of the flow, you probably want to answer without that interaction affecting the next action prediction. A triggered custom action can do anything, but here’s a simple example that dispatches a bot utterance and then reverts the interaction:

class ActionIsBot(Action):
"""Revertible mapped action for utter_is_bot"""

def name(self):
    return "action_is_bot"

def run(self, dispatcher, tracker, domain):
    dispatcher.utter_template("utter_is_bot", tracker)
    return [UserUtteranceReverted()]

Note

If you use the MappingPolicy to predict bot utterances directly (e.g. triggers: utter_{}), these interactions must go in your stories, as in this case there is no UserUtteranceReverted() and the intent and the mapped utterance will appear in the dialogue history.

Fallback Policy

The FallbackPolicy invokes a fallback action if at least one of the following occurs: 1. The intent recognition has a confidence below nlu_threshold. 2. The highest ranked intent differs in confidence with the second highest ranked intent by less than ambiguity_threshold. 3. None of the dialogue policies predict an action with confidence higher than core_threshold.

Configuration:

The thresholds and fallback action can be adjusted in the policy configuration file as parameters of the FallbackPolicy:

policies:
  - name: "FallbackPolicy"
    nlu_threshold: 0.3
    ambiguity_threshold: 0.1
    core_threshold: 0.3
    fallback_action_name: 'action_default_fallback'
nlu_threshold Min confidence needed to accept an NLU prediction
ambiguity_threshold Min amount by which the confidence of the top intent must exceed that of the second highest ranked intent.
core_threshold Min confidence needed to accept an action prediction from Rasa Core
fallback_action_name Name of the fallback action to be called if the confidence of intent or action is below the respective threshold

You can also configure the FallbackPolicy in your python code:

from rasa.core.policies.fallback import FallbackPolicy
from rasa.core.policies.keras_policy import KerasPolicy
from rasa.core.agent import Agent

fallback = FallbackPolicy(fallback_action_name="action_default_fallback",
                          core_threshold=0.3,
                          nlu_threshold=0.3,
                          ambiguity_threshold=0.1)

agent = Agent("domain.yml", policies=[KerasPolicy(), fallback])

Note

You can include either the FallbackPolicy or the TwoStageFallbackPolicy in your configuration, but not both.

Two-Stage Fallback Policy

The TwoStageFallbackPolicy handles low NLU confidence in multiple stages by trying to disambiguate the user input.

  • If an NLU prediction has a low confidence score or is not significantly higher than the second highest ranked prediction, the user is asked to affirm the classification of the intent.

    • If they affirm, the story continues as if the intent was classified with high confidence from the beginning.
    • If they deny, the user is asked to rephrase their message.
  • Rephrasing

    • If the classification of the rephrased intent was confident, the story continues as if the user had this intent from the beginning.
    • If the rephrased intent was not classified with high confidence, the user is asked to affirm the classified intent.
  • Second affirmation

    • If the user affirms the intent, the story continues as if the user had this intent from the beginning.
    • If the user denies, the original intent is classified as the specified deny_suggestion_intent_name, and an ultimate fallback action is triggered (e.g. a handoff to a human).

Configuration:

To use the TwoStageFallbackPolicy, include the following in your policy configuration.

policies:
  - name: TwoStageFallbackPolicy
    nlu_threshold: 0.3
    ambiguity_threshold: 0.1
    core_threshold: 0.3
    fallback_core_action_name: "action_default_fallback"
    fallback_nlu_action_name: "action_default_fallback"
    deny_suggestion_intent_name: "out_of_scope"
nlu_threshold Min confidence needed to accept an NLU prediction
ambiguity_threshold | Min amount by which the confidence of the
top intent must exceed that of the second
highest ranked intent.
core_threshold | Min confidence needed to accept an action
prediction from Rasa Core
fallback_core_action_name Name of the fallback action to be called if the confidence of Rasa Core action prediction is below the core_threshold
fallback_nlu_action_name Name of the fallback action to be called if the confidence of Rasa NLU intent classification is below the nlu_threshold
deny_suggestion_intent_name The name of the intent which is used to detect that the user denies the suggested intents

Note

You can include either the FallbackPolicy or the TwoStageFallbackPolicy in your configuration, but not both.

Form Policy

The FormPolicy is an extension of the MemoizationPolicy which handles the filling of forms. Once a FormAction is called, the FormPolicy will continually predict the FormAction until all required slots in the form are filled. For more information, see Forms.