Continous Integration (CI) is the practice of merging in code changes frequently and automatically testing changes as they are committed. Continuous Deployment (CD) means automatically deploying integrated changes to a staging or production environment. Together, they allow you to make more frequent improvements to your assistant and efficiently test and deploy those changes.
This guide will cover what should go in a CI/CD pipeline, specific to a Rasa project. How you implement that pipeline is up to you. There are many CI/CD tools out there, such as GitHub Actions, GitLab CI/CD, Jenkins, and CircleCI. We recommend choosing a tool that integrates with whatever Git repository you use.
Continuous Integration (CI)
The best way to improve an assistant is with frequent incremental updates. No matter how small a change is, you want to be sure that it doesn't introduce new problems or negatively impact the performance of your assistant.
It is usually best to run CI checks on merge / pull requests or on commit. Most tests are quick enough to run on every change. However, you can choose to run more resource-intensive tests only when certain files have been changed or when some other indicator is present. For example, if your code is hosted on Github, you can make a test run only if the pull request has a certain label (e.g. “NLU testing required”).
Validating Data and Stories
Data validation verifies that no mistakes or major inconsistencies appear in your domain, NLU data, or story data. To validate your data, have your CI run this command:
If you pass a
max_history value to one or more policies in your
config.yml file, provide the
smallest of those values as
If data validation results in errors, training a model can also fail or yield bad performance, so it's
always good to run this check before training a model. By including the
--fail-on-warnings flag, this step will fail on warnings indicating more minor issues.
rasa data validate does not test if your rules are consistent with your stories.
However, during training, the
RulePolicy checks for conflicts between rules and stories. Any such conflict will abort training.
To read more about the validator and all of the available options, see the documentation for
rasa data validate.
Training a Model
You can train a model with the following command:
Training a model verifies that your NLU pipeline and policy configurations are valid and trainable, and provides a model to use for testing your assistant. If the model passes the CI tests, then you can also upload the trained model to your server as part of the continuous deployment process.
Testing Your Assistant
Testing your trained model on test stories is the best way to have confidence in how your assistant will act in certain situations. Written in a modified story format, test stories allow you to provide entire conversations and test that, given certain user input, your model will behave in the expected manner. This is especially important as you start introducing more complicated stories from user conversations.
The command to run test stories is:
By default, the command will run tests on stories from any files with names starting with
test_. You can also provide
a specific test stories file or directory with the
--fail-on-prediction-errors flag ensures the test will fail if any test
End-to-end conversation testing is only as thorough and accurate as the test cases you include, so you should continue to grow your set of test stories as you make improvements to your assistant. A good rule of thumb to follow is that you should aim for your test stories to be representative of the true distribution of real conversations. Rasa X makes it easy to add test conversations based on real conversations.
Running test stories does not execute your action code. You will need to test your action code in a separate step.
Comparing NLU Performance
If you've made significant changes to your NLU training data (e.g. splitting an intent into two intents or adding a lot of training examples), you should run a full NLU evaluation. You'll want to compare the performance of the NLU model without your changes to an NLU model with your changes.
You can do this by running NLU testing in cross-validation mode:
You could also train a model on a training set and testing it on a test set. If you use the train-test
set approach, it is best to shuffle and split your data using
rasa data split as part of this CI step, as
opposed to using a static NLU test set, which can easily become outdated.
Because this test doesn't result in a pass/fail exit code, it's best to make the results visible so that you can interpret them. For example, this GitHub Actions Workflow includes commenting on a PR with a results table that shows which intents are confused with others.
Since NLU comparison can be a fairly resource intensive test, you may choose to run this test only when certain conditions are met. Conditions might include the presence of a manual label (e.g. “NLU testing required”), changes to NLU data, or changes to the NLU pipeline.
Testing Action Code
The approach used to test your action code will depend on how it is implemented. For example, if you connect to external APIs, you should write integration tests to ensure that those APIs respond as expected to common inputs. However you test your action code, you should include these tests in your CI pipeline so that they run each time you make changes.
Continuous Deployment (CD)
To get improvements out to your users frequently, you will want to automate as much of the deployment process as possible.
CD steps usually run on push or merge to a certain branch, once CI checks have succeeded.
Deploying Your Rasa Model
If you ran test stories in your CI pipeline, you'll already have a trained model. You can set up your CD pipeline to upload the trained model to your Rasa server if the CI results are satisfactory. For example, to upload a model to Rasa X:
updates to action code
If your update includes changes to both your model and your action
code, and these changes depend on each other in any way, you should not
automatically tag the model as
production. You will first need to build and
deploy your updated action server, so that the new model won't e.g. call
actions that don't exist in the pre-update action server.
Deploying Your Action Server
You can automate building and uploading a new image for your action server to an image repository for each update to your action code. As noted above, be careful with automatically deploying a new image tag to production if the action server would be incompatible with the current production model.
Example CI/CD pipelines
These examples are just two of many possibilities. If you have a CI/CD setup you like, please share it with the Rasa community on the forum.