Skip to content

January 16th, 2020

Integrated Version Control: Linking Rasa X with Git-based Development Workflows

  • portrait of Rasa

    Rasa

Update: A lot of things have changed since this post was written. Rasa X, the freemium companion tool to Rasa Open Source, is no longer supported or maintained, and we are currently focused on the development of the Rasa Enterprise platform. To learn more about this, you can check out this blog post

As of Rasa X version 0.23.0, we've added a new feature that allows developers to version training data by connecting Rasa X with a Git repository on a remote server. Integrated Version Control creates a bridge between Rasa X and downstream workflows like continuous integration and continuous deployment (CI/CD).

For the past several weeks, the community has had access to Integrated Version Control as an experimental feature. We've collected feedback from product teams and developers using Integrated Version Control and incorporated feedback from the community into our latest stable version: Rasa X 0.24.1. We now encourage all Rasa X users to activate Integrated Version Control and begin taking advantage of enhanced development workflows.

Integrated Version Control performs a two-way sync to check for differences between the training data in Rasa X and the remote repository on the Git server. If there are changes in Rasa X that haven't been pushed to the remote repository, Integrated Version Control lets you commit those changes to an existing branch or create a new one. This streamlines the process of using Rasa X with CI/CD tools like Jenkins, CircleCI, and Travis CI.

We believe building contextual assistants should be approached the same way technical product teams develop software: using version control, branch-based development, and CI/CD. Integrated Version Control makes it easy to keep a record of modifications to training data, giving developers the option to roll back changes and providing transparency into the way updates influence model behavior. It allows teams to build AI assistants collaboratively, committing changes to a common development branch.

But most importantly, Integrated Version Control supports a larger strategy of continually improving assistants by generating training data from real conversations and feeding it back into the model. By connecting Rasa X to a continuous delivery pipeline, developers can exercise greater control over changes, iterate quickly, and automate QA to continuously improve their AI assistants.

Maintain Version History

Integrated Version Control securely connects to Git-based code hosting services like GitHub, Bitbucket, and GitLab, using SSH. Once the connection has been established, the latest data is pulled into Rasa X from the remote repository. A green status indicator signals that Rasa X is up-to-date with the Git server. As changes to training data are made in Rasa X, either by annotating past conversations and messages or adding stories through interactive learning, the green status becomes orange, indicating updates can be committed and pushed to the remote repository.

Each commit creates a record of the incremental changes made to the training data, which in turn helps technical product teams evaluate the models trained on that data. As teams iterate and experiment to improve models over time, version control ensures results are replicable and allows changes to be rolled back if needed.

Branch-based Development

Integrated Version Control enables technical product teams using Rasa X to take advantage of the downstream benefits of Git-based workflows, like branch based development and code reviews. Changes in Rasa X are committed to a target branch, allowing teams to test changes and control the way updates are deployed to production or a staging environment.

For incremental updates to the assistant, like altering the way actions work, you can change the code in your favorite text editor or IDE on your local machine, version it in Git, and push your changes to your remote Git repository. Meanwhile, teammates can annotate data on the Rasa X server and add their own changes to another branch. Integrated Version Control fits Rasa X into branched-based development workflows, allowing multiple features to be built at the same time maintenance tasks are being performed.

Integrating Rasa X with Git also allows teams to introduce a code review process for training data updates by opening pull requests and assigning reviewers. This provides an additional measure of quality control as updates are made.

Connect Rasa X to CI/CD Workflows

One of the most exciting implications of Integrated Version Control is the ability to connect Rasa X with continuous delivery workflows. Rasa X can now become the first touchpoint along an automated deployment pipeline, using tools like Jenkins, CircleCI, and Travis.

Let's consider an example workflow. When training data updates are pushed from Rasa X to the remote Git server, it triggers a sequence of steps along an automated pipeline: retraining the model, running end-to-end tests, generating model evaluation reports, and deploying to a staging or production environment. Depending on your CI/CD platform and configuration, these steps can be fully automated or certain steps can be manually applied.

CI/CD doesn't just encourage rapid iteration and reduce the risk associated with deployment, it also allows teams to add value by running additional scripts and processes. The Rasa CLI includes tools to evaluate a new model, which can be used to generate artifacts like confusion matrices and confidence histograms during the pipeline build. These artifacts can be automatically added to pull requests and included as part of the review process. Utilities like these increase productivity by reducing the amount of time spent on manual deployment tasks and provide immediate insight into the performance of new models.

We've provided an example workflow to get you started, using GitHub Actions. The pipeline builds and uploads the latest model, runs cross-validation tests on the NLU model, and then uses a Python script to format the results into a report.

A Toolset For Continuous Improvement

In our experience, the best assistants are built by technical product teams that follow repeatable and scalable practices for continually improving assistants, based on data collected from user interactions.

We built Rasa X with this philosophy in mind. Rasa X allows technical product teams to review past conversations, identify edge cases, and convert real conversations into training data to improve the assistant. Integrated Version Control takes this a step further by making it easier to integrate with automated deployment workflows and end-to-end testing, allowing developers to close the loop between Rasa X and the tools they already use. By incorporating testing into every update, product teams receive immediate feedback on how changes affect the assistant's performance, reinforcing continuous quality improvements.

Demo

Ready to see Integrated Version Control in action? Watch a video demonstrating how to get started and connect Integrated Version Control with a remote GitHub repo:

Conclusion

Integrated Version Control is an important step toward making Rasa X work with developers' existing tools and workflows. Developers can now apply the same best practices they use to manage code updates to managing updates to training data. As a result, product teams can iterate and improve assistants using scalable and repeatable processes.

Have questions? On January 29, we'll be hosting a 45-minute webinar dedicated to Integrated Version Control. Register here to save your seat, and pre-submit your questions in the forum to get them answered live on air.

Learn more about Integrated Version Control in the documentation, and try it out by updating to the latest version of Rasa X. You can also find out more about how to deploy updates and run CI/CD processes by watching our recent webinar on using Rasa X with Carbon bot, a Rasa research project.

Lastly, head over to the community forum to tell us what you think. We want to know which tools you use for code hosting and CI/CD, the use cases you have in mind, and what you think about using Integrated Version Control so far. We'll be looking to the community to help us decide which enhancements and features we should prioritize next!