Version: 2.0.x

Setting up CI/CD

Even though developing a contextual assistant is different from developing traditional software, you should still follow software development best practices. Setting up a Continuous Integration (CI) and Continuous Deployment (CD) pipeline ensures that incremental updates to your bot are improving it, not harming it.


Continous Integration (CI) is the practice of merging in code changes frequently and automatically testing changes as they are committed. Continuous Deployment (CD) means automatically deploying integrated changes to a staging or production environment. Together, they allow you to make more frequent improvements to your assistant and efficiently test and deploy those changes.

This guide will cover what should go in a CI/CD pipeline, specific to a Rasa project. How you implement that pipeline is up to you. There are many CI/CD tools out there, such as GitHub Actions, GitLab CI/CD, Jenkins, and CircleCI. We recommend choosing a tool that integrates with whatever Git repository you use.

Continuous Integration (CI)

The best way to improve an assistant is with frequent incremental updates. No matter how small a change is, you want to be sure that it doesn't introduce new problems or negatively impact the performance of your assistant.

It is usually best to run CI checks on merge / pull requests or on commit. Most tests are quick enough to run on every change. However, you can choose to run more resource-intensive tests only when certain files have been changed or when some other indicator is present. For example, if your code is hosted on Github, you can make a test run only if the pull request has a certain label (e.g. “NLU testing required”).

Validating Data and Stories

Data validation verifies that there are no mistakes or major inconsistencies in your domain, NLU data, or conversation data. To validate your data, have your CI run this command:

rasa data validate --fail-on-warnings --max-history <max_history>

If you pass a max_history value to a Memoization policy in your config.yml file, provide the same value in the above validator command. Otherwise, provide the default value of 5.

If data validation results in errors, training a model will also fail, so it's always good to run this check before training a model. By including the --fail-on-warnings flag, this step will fail on warnings indicating more minor issues.

To read more about the validator and all of the available options, see the documentation for rasa data validate.

Training a Model

You can train a model with the following command:

rasa train

Training a model verifies that your NLU pipeline and policy configurations are valid and trainable, and provides a model to use for testing your assistant. If the model passes the CI tests, then you can also upload the trained model to your server as part of the continuous deployment process.

Testing Your Assistant

Testing your trained model on test stories is the best way to have confidence in how your assistant will act in certain situations. Written in a modified story format, test stories allow you to provide entire conversations and test that, given certain user input, your model will behave in the expected manner. This is especially important as you start introducing more complicated stories from user conversations.

The command to run test stories is:

rasa test --fail-on-prediction-errors

By default, the command will run tests on stories from any files with names starting with test_. You can also provide a specific test stories file or directory with the --stories argument.

The --fail-on-prediction-errors flag ensures the test will fail if any test story fails.

End-to-end conversation testing is only as thorough and accurate as the test cases you include, so you should continue to grow your set of test stories as you make improvements to your assistant. A good rule of thumb to follow is that you should aim for your test stories to be representative of the true distribution of real conversations. Rasa X makes it easy to add test conversations based on real conversations.

Note: Running test stories does not execute your action code. You will need to test your action code in a separate step.

Comparing NLU Performance

If you've made significant changes to your NLU training data (e.g. splitting an intent into two intents or adding a lot of training examples), you should run a full NLU evaluation. You'll want to compare the performance of the NLU model without your changes to an NLU model with your changes.

You can do this by running NLU testing in cross-validation mode:

rasa test nlu --cross-validation

You could also train a model on a training set and testing it on a test set. If you use the train-test set approach, it is best to shuffle and split your data using rasa data split as part of this CI step, as opposed to using a static NLU test set, which can easily become outdated.

Because this test doesn't result in a pass/fail exit code, it's best to make the results visible so that you can interpret them. For example, this GitHub Actions Workflow includes commenting on a PR with a results table that shows which intents are confused with others.

Since NLU comparison can be a fairly resource intensive test, you may choose to run this test only when certain conditions are met. Conditions might include the presence of a manual label (e.g. “NLU testing required”), changes to NLU data, or changes to the NLU pipeline.

Testing Action Code

The approach used to test your action code will depend on how it is implemented. For example, if you connect to external APIs, you should write integration tests to ensure that those APIs respond as expected to common inputs. However you test your action code, you should include these tests in your CI pipeline so that they run each time you make changes.

Continuous Deployment (CD)

To get improvements out to your users frequently, you will want to automate as much of the deployment process as possible.

CD steps usually run on push or merge to a certain branch, once CI checks have succeeded.

Deploying Your Rasa Model

If you ran test stories in your CI pipeline, you'll already have a trained model. You can set up your CD pipeline to upload the trained model to your Rasa server if the CI results are satisfactory. For example, to upload a model to Rasa X:

curl -k -F "model=@models/my_model.tar.gz" "{your_api_token}"

If you are using Rasa X, you can also tag the uploaded model as production (or whichever deployment you want to tag if using multiple deployment environments):

curl -X PUT ""
updates to action code

If your update includes changes to both your model and your action code, and these changes depend on each other in any way, you should not automatically tag the model as production. You will first need to build and deploy your updated action server, so that the new model won't e.g. call actions that don't exist in the pre-update action server.

Deploying Your Action Server

You can automate building and uploading a new image for your action server to an image repository for each update to your action code. As noted above, be careful with automatically deploying a new image tag to production if the action server would be incompatible with the current production model.

Example CI/CD pipelines

As examples, see the CI/CD pipelines for Sara, the Rasa assistant that you can talk to in the Rasa Docs, and Carbon Bot. Both use Github Actions as a CI/CD tool.

These examples are just two of many possibilities. If you have a CI/CD setup you like, please share it with the Rasa community on the forum.