Benchmarking Success in Enterprise AI Teams

The purpose of the scorecard is to take the habits and processes that advanced teams use to get the most out of Rasa and make them concrete and actionable. The scorecard complements your project success plan (you can see an example plan here). The success plan specifies what to do and when, and the scorecard measures how well it's going. In short, the enterprise scorecard helps teams understand how prepared they are for production, how well they are following CDD best practices, and where they should prioritize efforts to improve.

How it works

The scorecard looks at your AI assistant from multiple perspectives to evaluate production readiness.

First, the wizard will ask you a set of questions about your project. This covers a number of topics about your infrastructure and deployment setup.
Second, your project structure will be analyzed to look for things like test story coverage, the implementation of fallback mechanisms, and more.
Third, data fetched from the Rasa Enterprise API will assess the health of your training data and the success KPIs you have implemented for your assistant.

Your project is then scored in four categories:

Training data health: do you have a high quality dataset as the foundation of your NLU model?
Success KPIs: are you tracking the performance of your assistant over time?
Continuous Integration: are you guarded against introducing regressions when you make changes to your assistant?
Continuous deployment: do you have a safe and automated deployment mechanism for bringing changes live?

After establishing an initial baseline score in each of the categories, It's important to establish a rhythm of regularly re-running the scorecard and keeping track of your improvements. Because the answers you give to the wizard are saved, getting an updated score takes just a few seconds.

Better NLU performance

The results of one pilot team, a US-based health insurer, illustrate how tracking progress can translate into very real performance gains. The team raised the F1 score of their intent model from 0.80 to 0.93 by working with the scorecard over 6 weeks to improve their training data health. The scorecard dovetails nicely with the powerful new intent insights feature in Rasa Enterprise.

Shipping with confidence

The same team also took action to harden their assistant against introducing regressions. Shipping improvements quickly and often is key to building great products, and AI assistants are no different. Reviewing the results of their scorecard, the team noticed that their test stories covered just 28% of their domain. In other words, the majority of their intents, entities, and actions never appeared in any test stories! By investing a sprint to write more test stories, they increased their coverage to 80% and continue to make improvements. The team is now far more confident that they can make changes to their assistant without ever breaking existing functionality.

How you can use it

You can start using the scorecard on day 1 of your project, well before you have an assistant in production or have Rasa Enterprise installed. Rasa customers are paired with a customer success engineer who will help to run it the first time. Once you have Rasa Enterprise deployed and hooked up to your live assistant, you can really start to measure your progress on practicing CDD and improving the quality of your assistant. We recommend working with your customer success manager to set goals on each of the four scorecard categories.

Given the impressive successes we've seen in the pilot group, I am especially excited that we're rolling this out to all of our customers. Our experience has shown that building assistants is a multidimensional effort. It requires attention to data, processes for tracking success, and infrastructure for testing. To meet these needs, Rasa is investing in conversational teams and organizations on top of our investment in technology. We expect the enterprise readiness scorecard to become an integral part of the way we support enterprise teams' success, enabling teams to build the next level of virtual assistants.