How did we get here?
Back in Rasa Open Source version 0.x the rasa_core and rasa_nlu portions of the Rasa codebase lived in separate repositories. Later, in Rasa Open Source 1.0, these two got merged into the same repository, which we now know as Rasa Open Source. However, even though the code was merged into a single repository, the lines of code themselves are still separate.
Originally this was βfineβ because there was a clear separation of concerns between what the NLU part of the pipeline and the policies in the Core module should do during training.

The NLU system was trained completely independently of the policies in the CORE system and resembled a linear pipeline of serially connected components. When an assistant is live the predictions from the NLU system would feed into the policies, but during training, these two systems are independent of each other.
This changed in Rasa Open Source 2.2 with the introduction of end-to-end learning.

In the end-to-end scenario, TED is also able to use the featurizers in the NLU pipeline to make predictions for texts that donβt perfectly fit an intent. This is a feature that we need in order to move conversational AI forward, so you could wonder if it really makes sense to think of the machine learning system as two distinct NLU/CORE components.
So letβs see if we can βredrawβ the components and their relationship during training.

Instead of looking at the NLU pipeline like a linear sequence of steps, you can also represent it as a tree-like graph of computation. Specifically, this type of tree is often referred to as a DAG (Directed Acyclic Graph) because it has no cycles in it. When you represent the NLU pipeline as a graph, it suddenly also integrates nicely with the policy algorithms.
Each node in this graph can still represent a component like before but the edges now represent a dependence. For example, when you look at this graph it becomes clear that the DIET Classifier doesnβt need tokens that leave the Tokeniser, but it does need the numeric features from the CountVectorizers that are generated for each token. It also becomes clear that the RegexEntityExtractor only needs the tokens in order for it to extract entities.
There are a few benefits that you get when you think about machine learning pipelines this way.
- This is much more customizable. Do you want to send the count vector features to DIET but not to the policies? Conceptually, this feature is a lot easier to consider now that we have a graph in our minds.
- Thereβs room for parallelism too. For example, the regex entity extractor does not need the count vectors, so while featurisation is taking place the regex entity extractor can already start doing its inference.
- Instead of worrying about whether or not a component is an NLU component or a Policy component, a DAG allows thinking more in terms of βjust another componentβ. Maybe in the future, weβll want to predict entities based on the conversation so far and the current message. To implement this in the 2.x Rasa codebase would be a huge undertaking while in the DAG setting itβd be an easy experiment.
Thereβs also another big benefit that we can now consider as well: caching.
DAGs with Caches
In Rasa 2.x, a single change to the NLU pipeline file would trigger retraining of all the components in it. The pipeline was fingerprinted as a whole, which could cause Rasa to assume that all components deserved retraining even if only a small change occurred.
But if you represent the computational backend as a DAG, youβre suddenly open to more options. Letβs suppose we start out with the graph shown below. This graph represents a Rasa pipeline with end-to-end enabled.

Letβs now imagine that we change a hyperparameter of the second CountVectorizer. This will impact the DIETClassifier, TED, and the Policy Ensemble. But the rest of the pipeline doesnβt need retraining.

If weβre able to cache already trained components on disk would only need to train the red components and keep the green components as is. This can seriously prevent a whole lot of computation when youβre trying out new settings for your assistant.
An Example
If you install the main branch of Rasa Open Source today you can already experience a glimpse of what the caching experience may be like.
# Install the main branch of Rasa via pip.python -m pip install --upgrade "rasa @ git+https://github.com/RasaHQ/rasa.git"
When you call rasa init with this version youβll be able to start a quick demo project which you can also immediately train. The training experience is similar to before but now youβll notice that a .rasa folder appears as a side-effect. This folder contains a SQLite file called cache.db as well as files and folders that represent cached results in the training graph.
π /workspace/rasa-examples/example/.rasa/cacheβ£ββ π tmp3qs8_jkvβ£ββ π tmpdbttxq00ββ£ββ π featurizer.json (160 bytes)ββββ π memorized_turns.json (1.9 kB)β£ββ π tmperp36t42ββββ π feature_to_idx_dict.pkl (784 bytes)β£ββ π tmpfc04vxlqββ£ββ π checkpoint (131 bytes)ββ£ββ π DIETClassifierGraphComponent.data_example.pkl (2.0 kB)ββ£ββ π DIETClassifierGraphComponent.entity_tag_specs.json (2 bytes)ββ£ββ π DIETClassifierGraphComponent.index_label_id_mapping.json (120 bytes)ββ£ββ π DIETClassifierGraphComponent.label_data.pkl (1.8 kB)ββ£ββ π DIETClassifierGraphComponent.sparse_feature_sizes.pkl (75 bytes)ββ£ββ π DIETClassifierGraphComponent.tf_model.data-00000-of-00001 (28.3 MB)ββββ π DIETClassifierGraphComponent.tf_model.index (11.5 kB)β£ββ π tmpmj913uyfββ£ββ π featurizer.json (160 bytes)ββ£ββ π rule_only_data.json (52 bytes)ββββ π rule_policy.json (1.6 kB)β£ββ π tmpmn1t44siββββ π patterns.pkl (2 bytes)β£ββ π tmptwx0omrgβ£ββ π tmpukyg7ozwββ£ββ π checkpoint (95 bytes)ββ£ββ π featurizer.json (1.0 kB)ββ£ββ π ted_policy.data_example.pkl (1.2 kB)ββ£ββ π ted_policy.entity_tag_specs.json (2 bytes)ββ£ββ π ted_policy.fake_features.pkl (817 bytes)ββ£ββ π ted_policy.label_data.pkl (2.0 kB)ββ£ββ π ted_policy.meta.pkl (2.0 kB)ββ£ββ π ted_policy.priority.pkl (1 byte)ββ£ββ π ted_policy.tf_model.data-00000-of-00001 (3.5 MB)ββββ π ted_policy.tf_model.index (8.3 kB)β£ββ π tmpyt5kdd6pββ£ββ π oov_words.json (2 bytes)ββββ π vocabularies.pkl (1.7 kB)β£ββ π tmpzmwofb5sββ£ββ π oov_words.json (2 bytes)ββββ π vocabularies.pkl (8.5 kB)βββ π cache.db (16.4 kB)
You can inspect the files to confirm that component data has been cached. You can also confirm that certain components don't retrain when you change your config.yml file.If you were to add a RegexEntityExtractor to the config.yml file and configured your domain.yml file to recognize a new entity then youβll be able to confirm from the logs that the DIETClassifier does not retrain. In Rasa Open Source 2.x the last trained model would be the cached model state while now we have a global cache on all components separately.
Changes for Components
The graph in Rasa Open Source 3.0 is designed such that, as a user, youβll not directly interact with it. As shown in the previous example; youβll still interact with a config.yml file and donβt need to worry about the graph schema. This is generated on your behalf.
That said, there is one consequence of our new computational backend. If you have custom components then they will need to be migrated. The new architecture unifies many existing concepts in Rasa and this leads to changes. One of the main changes is that youβll need to register your components such that the graph is aware of them. Weβre currently working on a migration guide which we will share soon with the community.
Conclusion
We hope this document helps explain the motivation behind the new computational backend in Rasa Open Source 3.x. We skipped over a lot of implementations but we hope itβs clear that this new backend will introduce many benefits in the long run. You donβt need to worry about big changes to your config.yml file though. While the computational backend is changing, the config.yml file will remain your main interface with machine learning in Rasa.





