What is a tracker?
A tracker is a representation of a conversation between a user and your assistant. Trackers are used (among other things) to represent the stories and rules that are used to train policies. Policies pick the next action taken by the assistant. A simplified tracker for the conversation on the left looks something like this:
Data Generation, or how we turn training data into trackers
First, let’s look at the simplest case using moodbot, the assistant created when you run
rasa init on the command line. It asks you how you’re doing, and if you’re sad, will try to cheer you up with a cute picture.
Suppose we have written this story:
It can be represented by the following story step:
Each of the stories in the moodbot training data corresponds to one story step:
Which look like this:
|Happy path||Sad path 1||Sad path 2|
At this point, you might be wondering why we need story steps. Let’s illustrate that with an example.
Suppose you look more closely at the two sad paths, and realize that they actually have quite a few events in common:
You might choose to re-write these stories with the help of checkpoints. Checkpoints connect stories together, so that a story ending with a checkpoint can be extended with another story that begins with that same checkpoint. Checkpoints can help you organise your training data in such a way that it is easier for you to allow different stories to fit together to form an entire conversation. They can also allow you to write fewer stories. However, they should be used with care, because the process of checkpoint resolution slows down training, and they can make it harder to understand example stories. The two stories above can be expressed as:
These sets of stories are equivalent, here meaning that the set of two stories in the first example and the set of three stories in the second will be represented by the same two trackers.
First, the stories are transformed into story steps. Let’s take a closer look at one of the story steps:
Each story step has a list of start checkpoints (the checkpoint at the beginning of the story step), and end checkpoints (the checkpoint at the end of the story step). For this story step, the end checkpoint is the one we defined (‘tried_to_cheer_up’). There’s also a special STORY_START checkpoint. It’s a default checkpoint that’s added to a story step’s start checkpoints whenever no start checkpoint is defined. There’s no equivalent default end checkpoint: if we haven’t defined an end checkpoint for a story step, the list of end checkpoints is simply empty.
Now that we’ve introduced checkpoints, we can give a new definition of a story step. A story step represents the events between a start checkpoint, and an end checkpoint, or the end of a story.
The other two sad path story steps look like this:
And the original happy path story step looks like this:
Story steps are stitched together in a process known as data generation. Don’t be misled by the name, no new data is generated (there is no natural language generation happening, nor creation of conversation flows not in the original training data). Instead, “generation” refers to the trackers being generated from the stories.
A data generation phase consists of one pass over all the story steps. They are processed one by one.
In each generation phase, we maintain a dictionary of active trackers. These are the trackers that are considered for stitching in that phase. The dictionary maps from a checkpoint to trackers that end in this checkpoint. At the beginning of the first phase, this dictionary contains a special initial tracker. It has no events, and is associated with the STORY_START checkpoint.
The active trackers looks like this, with the STORY_START checkpoint mapping to the initial tracker:
Now we can begin to process the story steps. Suppose we start with the following story step:
We first check whether the start checkpoints of the story step are present in the active trackers. In our case, the start checkpoint is STORY_START, which is present in active trackers: it’s the initial tracker. The tracker it maps to is added to a list of incoming trackers for this story step.
Next, we process any incoming trackers with the story step. The incoming trackers are extended with the events from the story step to produce new trackers. The init tracker, which has no events, combined with the story step’s events produces this tracker:
After we’ve processed the story step with all its incoming trackers, we update the active trackers dictionary accordingly. If the story step we just processed has an end checkpoint, we add the trackers that were just produced to the active trackers. Otherwise, this tracker is considered a story end tracker, and we add it to a list for later. Remember that the active trackers dictionary maps from checkpoints to trackers that end in this checkpoint.
This tracker had no end checkpoint, so it is added to the list of story end trackers.
Then, we can process the next story step:
We first check whether the start checkpoints of the tracker (STORY_START, for this tracker) are in active trackers. Again, there is one tracker that corresponds to STORY_START: the empty initial tracker. We process this tracker together with the story step to produce:
The story step we processed had an end checkpoint, “tried_to_cheer_up”, so we update active trackers:
Then, we can process the next story step:
Again, we first check whether the start checkpoint in the story step is present in the active tracker keys.
In this case, the start checkpoint is “tried_to_cheer_up”, which is present in active trackers. The tracker is an incoming tracker for this story step.
We process the incoming trackers with the story step, extending the incoming tracker’s events with those of the story step, to produce this tracker:
Since the story step does not have any end checkpoints, this tracker is not added to active trackers. It is considered a story end tracker.
The last story step is processed in the same way as the previous one, which produces this tracker:
When we are done processing the story steps, the active trackers are cleaned up. We remove any trackers with used checkpoints, here meaning checkpoints that were used to stitch trackers together. In this case, both STORY_START and ‘tried_to_cheer_up’ have been used. We remove them both from the active trackers dictionary, which is now empty.
OR statements are handled in the same way as checkpoints. An OR statement enables you to write one story but allow for multiple different intents or slot-filling events at a step in the story. For example, you might deploy moodbot, and find that even though you expected people to either respond with a
affirm in response to
utter_did_that_help, sometimes they say something like “I am feeling great!” (
mood_happy). This should be treated as though they had said “yes” (
affirm). You can use an OR statement to express that
mood_great should be treated the same way when they are observed at this point in the conversation.
# stories.yml version: "2.0" stories: - story: sad path 1 steps: - intent: greet - action: utter_greet - intent: mood_unhappy - action: utter_cheer_up - action: utter_did_that_help - or: - intent: affirm - intent: mood_great - action: utter_happy
Internally, this is handled by generating a checkpoint for the OR statement. The story above is equivalent to writing three stories that can be stitched together with a checkpoint:
# stories.yml version: "2.0" stories: - story: sad path 1 - affirm steps: - intent: greet - action: utter_greet - intent: mood_unhappy - action: utter_cheer_up - action: utter_did_that_help - intent: affirm - checkpoint: helped - story: sad path 1 - mood_great steps: - intent: greet - action: utter_greet - intent: mood_unhappy - action: utter_cheer_up - action: utter_did_that_help - intent: affirm - intent: mood_great - checkpoint: helped - story: sad path 1 - helped steps: - checkpoint: helped - action: utter_happy
This is also equivalent to writing out the two stories, like so:
# stories.yml version: "2.0" stories: - story: sad path 1 - affirm steps: - intent: greet - action: utter_greet - intent: mood_unhappy - action: utter_cheer_up - action: utter_did_that_help - intent: affirm - action: utter_happy - story: sad path 1 - mood_great steps: - intent: greet - action: utter_greet - intent: mood_unhappy - action: utter_cheer_up - action: utter_did_that_help - intent: affirm - intent: mood_great - action: utter_happy
Either way, whether you use checkpoints, OR statements, or write out the stories for each scenario, two trackers will be produced:
|Scenario 1||Scenario 2|
Checkpoints and OR statements are expanded in the same way during testing, but be careful when you use them in test stories. You want to test with representative conversations, and your users will have no idea what checkpoints even are. Test conversations should be as close to possible as real user behavior, because that’s the end result you care about.
The process of data generation is also how we convert rules into trackers (rules are very similar to a story without an explicit checkpoint). However, stories and rules are never handled at the same time. We process all stories, and then all rules. This means that you won’t see stories being stitched with rules or vice versa!
Digging into the details
Data generation is a series of phases, where a phase is one pass over all story blocks.
We keep track of two sets of checkpoints, used, and unused.
used_checkpoints - checkpoints that have been processed, where processed means that they have been used to stitch a story block with another.
unused_checkpoints - checkpoints that have been seen, but not yet processed, where seen means we’ve stepped through a story block that starts with this checkpoint.
Keeping track of checkpoints allows us to stop data generation in the event that we can’t use all active trackers for stitching. Normally, data generation ends when the active trackers dictionary is empty. But trackers are only removed from active trackers if they’ve been used for stitching. Sometimes, a tracker can’t be used for stitching if you have ended a story with a particular checkpoint, but didn’t create a story snippet that begins with that checkpoint, or vice versa. Note that a story with an unused checkpoint is not the same as a story without any checkpoints. Every story without any checkpoints implicitly starts with the STORY_START checkpoint, and these can always be stitched to the initial, empty tracker. In the event of unused checkpoints, we will eventually reach a stage where the unused checkpoints don’t change from phase to phase. If this happens, we break off data generation, and issue a warning that some checkpoints were left unused.
Any trackers that don’t feed into active trackers can be divided into two categories: story end, and finished trackers.
story_end_trackers - trackers that represent a story from start to end, i.e. they do not end in a checkpoint and their events cannot be extended with the events of another story block.
For example, this story would produce a story end tracker:
- story: happy path steps: - intent: greet - action: utter_greet - intent: mood_great - action: utter_happy - intent: goodbye - action: utter_goodbye
Because it doesn’t have an end checkpoint, the story ends after the assistant says goodbye.
finished_trackers: trackers that do not represent a story to the end, either because they couldn’t be stitched, or because they are snippets of events leading up to an ActionReverted, UserUtteranceReverted, Restarted event.
The differentiation between these two types of trackers isn’t important in data generation, but becomes important during data augmentation.
The output of data generation is the union of
finished_trackers. The following pseudocode summarizes most of what we’ve gone over so far.
There is one more implementation detail which may be interesting for those debugging, and that is how trackers are deduplicated. If we’ve set
remove_duplicates to true for the TrainingDataGenerator (on by default) then the incoming trackers will be deduplicated before we process them with the current story step. There are two ways in which they can be deduplicated.
The first is if we’ve set the value
unique_last_num_states to an integer for the TrainingDataGenerator (None by default). In that case, the trackers will be deduplicated based on the last
unique_last_num_states states for each tracker. We keep the duplicates based on the last states in the
end_trackers, which are added to
The second is if the value of
unique_last_num_states is None. In this case, the incoming trackers are deduplicated based on their full events. Duplicate trackers are not fed into
end_trackers in this case.
We always deduplicate story_end_trackers after processing a story step based on the entire history, regardless of the value of
remove_duplicates and the value of
unique_last_num_states defined for the TrainingDataGenerator.
# INITIALISATION init_tracker ← an empty tracker with no events active_trackers[START_CHECKPOINT] ← init_tracker story_end_trackers ←  finished_trackers ←  # DATA GENERATION while not everything_reachable_is_reached # EACH PASS IS A SINGLE DATA GENERATION PHASE for each story_step # FIND INCOMING TRACKERS for start_checkpoint in story_step’s start checkpoints if start_checkpoint is in active_trackers incoming_trackers ← active_trackers[start_checkpoint] add start_checkpoint to used_checkpoints elif start_checkpoint not in used_checkpoints add start_checkpoint to unused_checkpoints trackers ←  end_trackers ←  # DEDUPLICATE INCOMING TRACKERS if remove_duplicates hashes_for_story_step ← set() unique_trackers ←  for tracker in incoming_trackers if hash(tracker.events) not in hashes_for_story_step # continue with a tracker if we haven’t seen its events before for # this story step if unique_last_num_states defined: # check if we’ve seen the truncated events before for this # story step truncated =(truncate tracker.events by unique_last_num_states) if hash(truncated) not in hashes_for_story_step # haven’t seen truncated events before hashes_for_story_step += hash(truncated) unique_trackers += incoming_tracker elif hash(tracker.events) not in self.hashes # have seen truncated events before for this story step, # but haven’t seen the full tracker before end_trackers += tracker else # unique_last_num_states is not defined unique_trackers += tracker hashes_for_story_step += hash(tracker.events) end deduplication finished_trackers += end_trackers incoming_trackers = unique_trackers # PROCESS INCOMING TRACKERS AND STORY STEP for incoming_tracker in incoming_trackers: new_tracker ← clone of incoming tracker for event in story_step.events if event is ActionReverted, UserUtteranceReverted, Restarted end_trackers += clone of new_tracker append event to new_tracker trackers += new_tracker # CLEANUP PRODUCED TRACKERS finished_trackers += end_trackers for end_checkpoint in story_block’s end checkpoints active_trackers[end_checkpoint] ← trackers if end_checkpoint in used_checkpoints unused_checkpoints += end_checkpoint if story_step has no end checkpoints unique_ends ← deduplicated trackers story_end_trackers += unique_ends end pass over single story step end pass over each story step == end of this phase # CLEANUP CHECKPOINTS AND ACTIVE TRACKERS unused_checkpoints += checkpoints in active_trackers and not in used_checkpoints active_trackers ← active_trackers items where the key is in unused_checkpoints # STOPPING CONDITION if active_trackers is empty or unused_checkpoints == previous unused_checkpoints # if active_trackers is empty, or we haven’t seen any new unused_checkpoints # since the last phase everything_reachable_is_reached ← True end while if everything_reachable_is_reached for end_checkpoint, trackers in active_trackers if end_checkpoint in unused_checkpoints # add trackers that couldn’t be stitched to finished_trackers finished_trackers += trackers perform data augmentation OR return story_end_trackers + finished_trackers
After data generation has finished, we optionally perform data augmentation. Data augmentation is a process by which stories are combined to create longer stories. This creates additional training data that should help TEDPolicy ignore irrelevant context. We’ll illustrate how this additional training data helps with another example.
Data augmentation is performed if the value of the augmentation factor is greater than zero. By default, it is set to 50. As of writing, it can be set using the command line flag
--augmentation 0, but there is an issue to move the configuration of this parameter to TEDPolicy here.
Note that data augmentation only benefits TEDPolicy and UnexpecTEDIntentPolicy and in fact all other policies (like RulePolicy, or AugmentedMemoizationPolicy) ignore the trackers created through data augmentation. It may seem surprising that AugmentedMemoizationPolicy ignores augmented trackers, but the “augmented” in AugmentedMemoizationPolicy refers to the forgetting mechanism that enables the policy to match the memorized stories to conversations with a reduced history.
Data augmentation is run very similarly to data generation. It consists of data augmentation phases, where each data augmentation phase consists of one pass over all the story steps. They are processed one by one. However, whereas data generation doesn’t specify a number of phases and continues until there are no more trackers for stitching, augmentation always ends after the third phase.
During a data augmentation phase, we again keep track of active trackers in a dictionary from checkpoints to the trackers that end in them. In augmentation, the active trackers are trackers that are considered for stitching, like they were during generation. However, this time, the active trackers map is populated with the story end trackers which were produced during data generation.
In data generation, two types of trackers can be produced: story end trackers, and finished trackers. Story end trackers are trackers that represent a complete story, whereas finished trackers are trackers that did not get to the “end” of a story: either because they end in a checkpoint that we couldn’t stitch, or because they are sequences of events leading up to an ActionReverted, UserUtteranceReverted, or Restarted event.
To start, we populate active trackers with the story end trackers, such that they all map from the STORY_START checkpoint to the list of trackers.
This is also the first time the augmentation factor comes in. We cap the total number of trackers considered for active trackers by this value x 10. By default, this value is 50, so we allow for no more than 50 x 10 = 500 active trackers. If there are more story end trackers than 500, we will randomly subsample 500 of the story end trackers to fill the active trackers.
For data augmentation, let’s consider a different example than moodbot. Suppose we have built an assistant that helps people order from a restaurant. They can also check their order status, as well as change their order. Note that this is a simplified example, if you were really trying to help people order from a restaurant you would probably use a form to collect various details about their order, and you would need a way for people to identify the order they want to change or get the status of.
The three stories are represented by the story steps below:
# stories.yml version: "2.0" stories: - story: order food steps: - intent: make_order - action: utter_confirm - intent: affirm - action: utter_order_success - story: check order status steps: - intent: check_order_status - action: utter_order_details - story: change order steps: - intent: change_order - action: utter_confirm - intent: affirm - action: utter_change_order_success
|Order food||Check order status||Change order|
In the data generation phase, we would have produced three trackers, one for each story. They are all story end trackers, so we use them to populate the active trackers for this round:
Now, we can process each story step. This part is very similar to the data generation phase. For each story step, we first check whether the start checkpoint of the story step is in active trackers.
We first process the story step below:
All of the active trackers end in the checkpoint our story step starts in (STORY_START), so they are added to incoming trackers. The trackers are processed with the story step, and we end up with the following trackers:
Now, it should be a bit clearer how the additional stories produced can help TEDPolicy to ignore irrelevant context. The point is that it doesn’t matter what happened before the user requested to check their order status.
Since the trackers produced don’t end in a checkpoint, they are not added to the active trackers for the next story step. Instead, they’re added to the list of story end trackers.
The other two story steps are processed in the same way. They both begin with STORY_START checkpoints, so all of the active trackers can be stitched to both of the story steps.
We end the phase with nine augmented story end trackers:
Unlike in data generation, we don’t clean up active trackers at the end of the phase. Instead, we
extend the active trackers with any story end trackers that were produced in the phase. In our case, the three original trackers in active trackers are extended with the nine new ones we produced this phase for twelve total. Then we check whether the active trackers exceed the cap of
max_number_of_augmented_trackers. If so, we would subsample randomly. However, at twelve total, we don’t exceed this cap, so all twelve trackers are considered in the next phase.
We mark any tracker produced in the augmentation rounds as “augmented”. This is an attribute of the tracker, and allows us to differentiate between augmented and original trackers when policies are trained. TEDPolicy and UnexpecTEDIntentPolicy are the only policies that are trained on both original and augmented trackers. All other policies are trained only on the original trackers, which were produced in the data generation round.
There are three total augmentation phases. At the end of the augmentation phases, we cap the number of augmented trackers by
max_number_of_augmented_trackers one last time. The augmented and original trackers are now ready to use for training.
Rules are never augmented, and never used to augment stories. This is because augmented rules would lead to unintended behavior, and RulePolicy would not benefit from augmentation, as it is not a machine learning policy that generalizes.
We also don’t augment during testing. This is because augmented stories are specifically supposed to help TEDPolicy learn to ignore irrelevant context, so it would not be appropriate to expect TEDPolicy to correctly predict the next action from unrelated story snippets.
Digging into the details
In data augmentation, we start with active trackers from the data generation stage. We subsample these so that there are no more trackers than the value defined by 10 x
augmentation_factor. This value is known as
The way active trackers are populated is the key difference between augmentation and generation. We also cap the number of active trackers, and the number of incoming trackers by
max_number_of_augmented_trackers any time we process a story step. Finally, we also cap the number of augmented trackers returned by
max_number_of_augmented_trackers. Whenever the number trackers are capped, we subsample randomly. Differences in the pseudocode between data augmentation and data generation are highlighted in yellow.
We also don’t need to track unused checkpoints in this round. If there are checkpoints that couldn’t be used for stitching, this issue will have been raised in the data generation round.
# INITIALISATION story_end_trackers ← from data generation finished_trackers ← from data generation active_trackers[START_CHECKPOINT] ← randomly subsample story_end_trackers with length max_num_of_augmented_trackers # DATA GENERATION for three phases: # EACH PASS IS A SINGLE DATA GENERATION PHASE for each story_step # FIND INCOMING TRACKERS for start_checkpoint in story_step’s start checkpoints if start_checkpoint is in active_trackers incoming_trackers ← active_trackers[start_checkpoint] trackers ←  end_trackers ←  # DEDUPLICATE INCOMING TRACKERS if remove_duplicates hashes_for_story_step ← set() unique_trackers ←  for tracker in incoming_trackers if hash(tracker.events) not in hashes_for_story_step # continue with a tracker if we haven’t seen its events before for # this story step if unique_last_num_states defined: # check if we’ve seen the truncated events before for this # story step truncated =(truncate tracker.events by unique_last_num_states) if hash(truncated) not in hashes_for_story_step # haven’t seen truncated events before hashes_for_story_step += hash(truncated) unique_trackers += incoming_tracker elif hash(tracker.events) not in self.hashes # have seen truncated events before for this story step, # but haven’t seen the full tracker before end_trackers += tracker else # unique_last_num_states is not defined unique_trackers += tracker hashes_for_story_step += hash(tracker.events) end deduplication finished_trackers += end_trackers incoming_trackers ← unique_trackers if len(incoming_trackers) > max_num_of_augmented_trackers: incoming_trackers ← randomly subsample incoming_trackers with length max_num_of_augmented_trackers # PROCESS INCOMING TRACKERS AND STORY STEP for incoming_tracker in incoming_trackers: new_tracker ← clone of incoming tracker for event in story_step.events if event is ActionReverted, UserUtteranceReverted, Restarted end_trackers += clone of new_tracker append event to new_tracker trackers += new_tracker # CLEANUP PRODUCED TRACKERS finished_trackers += end_trackers for end_checkpoint in story_step.end_checkpoints active_trackers[end_checkpoint] ← trackers if story_step has no end checkpoints unique_ends ← deduplicated trackers story_end_trackers += unique_ends end pass over single story step end pass over each story step == end of this phase # CAP ACTIVE TRACKERS active_trackers ← randomly subsample story_end_trackers with length max_num_of_augmented_trackers end for if number of augmented trackers in finished_trackers > max_num_of_augmented_trackers finished_trackers = original trackers + randomly subsample augmented trackers with length max_num_of_augmented_trackers return story_end_trackers + finished_trackers
In this blog post, we’ve walked you through how Rasa handles tracker loading, checkpoint and OR statement resolution, and story augmentation.
Tracker loading is the process by which training and testing data is turned into the format that Rasa uses for training and testing policies. It includes the data generation step, which is the process by which the stories and rules for an assistant are turned into trackers. Tracker loading can optionally include the data augmentation step, which is the process by which additional training trackers are produced. These additional training trackers are intended to help TEDPolicy and UnexpecTEDIntentPolicy learn to ignore irrelevant context.
We also walked through checkpoint resolution, the process by which stories with checkpoints are turned into trackers, as well as how OR statements are handled.