Training and evaluating NLU models from the command line offers a decent summary, but sometimes you might want to evaluate the model on something that is very specific. In these scenarios, you can load the trained model in a Jupyter notebook and use other open-source tools to fully explore and evaluate it. In this blog post, we'll explain how you can do this.
Note! This blogpost was written with Rasa Open Source 2.x in mind. It's possible that the code no longer runs in Rasa Open Source 3.0 onward.
Loading in the NLU model
To demonstrate the use case, we've created a small project here. It contains a subset of chit-chat intents from our rasa demo project. You can clone the repository to follow along, but you can also run the steps shown here on your own project.
We're assuming that you have Rasa Open Source 2.0.2 installed and that you're in a virtual environment that also has Jupyter installed. Assuming you've got a notebook running, you can begin loading in a pre-trained NLU model by using the utility function found below.
The nlu_interpreter now represents the pre-trained pipeline. It contains all the NLU components that were defined in the config.yml file. In our case we have this NLU configuration in our config.yml file:
And this is what the pipeline in the interpreter looks like:
This interpreter object contains all the trained NLU components, and it will be the main object that we'll interact with. One of the main features of this component is the ability to parse new texts. This will give us a dictionary with detected intents and entities as well as some confidence scores.
This is very useful because it allows us to make predictions on any text we like! We can also use it to make predictions on data stored in a nlu.yml file.
NLU data and messages
To load your local NLU data, you can use another utility from Rasa.
There are a lot of properties attached to the train_data variable, but the most interesting one for our use case is train_data.intent_examples. It contains a list of all the intent examples found in our training data. These examples are represented as a Message object that Rasa uses internally as a container for any relevant information attached to an utterance.
To inspect the contents of these messages it can be helpful to retrieve them as dictionaries.
If you're interested to see what properties the pipeline adds to the message, you can iterate over each component in the interpreter and see the effect.
You can now see that tokens have been added as well as predictions.
If you're really interested and want to go further, you could even retrieve the machine learning features that were generated.
Explore and evaluate with Python tools
While exploring the inner workings of Rasa NLU is fun, you're probably more interested in using the Jupyter notebook to evaluate the model. That means that you probably want to get your data into a pandas data frame so you can analyse it from there. The script below will do just that.
Having data in a data frame allows you to write specific queries that calculate exactly what you're interested in. Here's a simple aggregation that calculates the confidence scores per intent.
This summary will look something like this:
The main benefit of having this information in a data frame is that you can easily interact with other tools in the Python ecosystem. You can zoom in on a particular intent and you can make whatever charts you like. For example, you can use scikit-learn to generate classification reports.
This is what our generated report looks like.
You can also use Python's ecosystem of visualisation tools, like Altair.
You can combine your pandas analysis with visualizations to construct whatever view you're interested in. Just to give one example, the chart below creates an interactive confusion matrix. A nice property of Altair is that you can export the charts to the front end natively and give it an interactive toolbar.
By exploring some of the internal APIs of Rasa Open Source we're reminded of a core feature: the ability to customise. It's possible to load a custom Rasa pipeline into Jupyter so you can use other open-source tools to fully explore and evaluate a trained model. You're free to explore and interact with these models as you see fit.