Custom SpaCy 3.0 models in Rasa

This guide was written for Rasa version 2.6.2 with spaCy version 3.0.6. However, it should still be compatible with Rasa 3.x and spaCy 3.x.

Let's say that you're a financial organisation interested in building a virtual assistant. The virtual assistant will likely need to be able to detect certain entities: dates, bank accounts, addresses, as well as financial jargon related to mortgages. Odds are though, it's not just your virtual assistant that needs to detect these entities. This capability will also be relevant for other use cases in your organisation, like contract parsing, automated email forwarding in customer service and detecting fraud. It therefore makes sense to explore making a general model that can be re-used by many teams in your organisation.

SpaCy is a popular tool for such pipelines. In this guide, we're going to bootstrap a spaCy project and show you how you can integrate it with Rasa. If you're unfamiliar with spaCy feel free to check out the spaCy online course or spaCy introductory YouTube series. In particular, we're going to use the new projects feature from spaCy 3.0, which is explained in detail in this YouTube video.

Setup

Let's start by setting up a new Rasa project. This will be the project that we will configure to run with spaCy.

Shell

python -m pip install rasa==2.6.2 spacy==3.0.6
python -m rasa init

We won't make any changes to the Rasa project just yet. Instead, we will start a new spaCy project that will contain our custom entity detection model. We will use the ner demo starterpack from spaCy so that we can get started quickly.

Shell

python -m spacy project clone pipelines/ner_demo_replace
cd ner_demo_replace

The ner_demo_replace package contains a spaCy project that will train an entity detection model. It only has a few examples to train on because it's merely meant for demonstration purposes, but the project structure is ready to run for more serious problems. Here's what the project folder contains.

📂 /Development/spacy-3-rasa-integration/ner_demo_replace
┣━━ 📂 assets
┣━━ 📂 configs
┣━━ 📂 corpus
┣━━ 📂 packages
┣━━ 📂 scripts
┣━━ 📂 training
┣━━ 📄 project.yml (3.9 kB)
┣━━ 📄 README.md (2.1 kB)
┗━━ 📄 requirements.txt (58 bytes)

This folder represents a self-contained spaCy project. It contains all the required scripts and commands that we need to train a new model. To get started with our model, we'll first need to download the required training data. These belong in the assets directory, but this project is configured to automatically fetch the files we need when we run;

Shell

python -m spacy project assets

The training data is minimal and only meant for illustration purposes. It mainly contains json data with some examples where the term "horse" needs to be detected as an ANIMAL entity. The project can be extended with a more meaningful dataset, but for illustration purposes this dataset will work to show how to create a custom spaCy model.

With the data downloaded we can now run the required commands to train a model. The commands are configured in the project.yml file. You can preview the available commands via:

Shell

python -m spacy project run

Demo replacing an NER component in a pretrained pipeline

Available commands in project.yml
Usage: python -m spacy project run [COMMAND]

download          Download the pretrained pipeline
convert           Convert the data to spaCy's binary format
create-config     Create a config
train             Train the NER model
evaluate          Evaluate the model and export metrics
package           Package the trained model as a pip package
visualize-model   Visualize the model's output via Streamlit

Available workflows in project.yml
Usage: python -m spacy project run [WORKFLOW]

all   convert -> create-config -> train -> evaluate

In our case, we're mainly interested in creating a new model and then packaging it up. So we'll run the all command first. This will convert the training data to a binary format, configure, train and evaluate the trained model. Once that is done we'll package it with the package command.

Shell

python -m spacy project run all
python -m spacy project run package

This gives us a trained spaCy model. You can inspect it in the packages folder inside of the spaCy project.

📂 ner_demo_replace/packages/en_ner_demo_replace-0.0.0
┣━━ 📂 dist
┣━━ 📂 en_ner_demo_replace
┣━━ 📂 en_ner_demo_replace.egg-info
┣━━ 📄 MANIFEST.in (33 bytes)
┣━━ 📄 meta.json (2.6 kB)
┗━━ 🐍 setup.py (1.8 kB)

You'll notice that we've exported a setup.py file here as well. That means that we can install the trained spaCy model via pip to have it available in our virtual environment.

Shell

python -m pip install packages/en_ner_demo_replace-0.0.0/

Now that the model is installed we can access it from spaCy within python too. It's load-able via the spacy.load command.

Shell

> import spacy
> nlp = spacy.load("en_ner_demo_replace")
> nlp("i really like horses").ents
(horses,)

Now that our custom model is load-able by spaCy, it is also load-able from Rasa.

Back to Rasa

Let's make a change to the NLU pipeline in our `config.yml` file so that we're able to use our own spaCy model.

YAML

language: en
pipeline:
  - name: SpacyNLP
    model: en_ner_demo_replace
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
    pooling: mean
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: SpacyEntityExtractor

The main part that is important here is that we're configuring a SpacyNLP component in our pipeline that is referring to our own en_ner_demo_replace model. Once this model is loaded it will be used by the SpacyTokenizer, SpacyFeaturizer and the SpacyEntityExtractor. If we can confirm this by training a new Rasa model and asking it to check for entities.
Let's start by training the Rasa pipeline.

Shell

rasa train nlu

Once it is trained, we can run the NLU model from the shell.

Shell

> rasa shell nlu
NLU model loaded. Type a message and press enter to parse it.
Next message:
> hello there
{
  "text": "i really like horses",
  "entities": [
    {
      "entity": "ANIMAL",
      "value": "horses",
      "start": 14,
      "confidence": null,
      "end": 20,
      "extractor": "SpacyEntityExtractor"
    }
  ],
...
}

And there you have it! We've trained our own spaCy model and loaded it in Rasa.

Final Details

The example that we've used in this guide is not representative of a real-life problem. We've chosen this project as a demonstration because it's fast to get started with. The spaCy project can easily be extended however simply by replacing the json data that is used to train and validate the spaCy model. If you're interested in playing around with a larger model you might enjoy this project that detects fashion brands or this project that detects food ingredients.

If you're interested in doing a deep dive into spaCy models we recommend reading more about the config system. In our case we've kept things simple, but we could have customised it further. We could, for example, have configured our custom model to use the word vectors from the large English model. We could do that by changing the components.tok2vec property in the configs/config.cfg file.

configs/config.cfg

cfg

[components.tok2vec]
source = "en_core_web_lg"

We could even go a step further and change the settings of the internal spaCy model from this config file. The base settings use a convolution model, which is configured like below.

configs/config.cfg

cfg

[components.ner.model.tok2vec]
@architectures = "spacy.HashEmbedCNN.v2"
pretrained_vectors = null
width = 96
depth = 4
embed_size = 2000
window_size = 1
maxout_pieces = 3
subword_features = true

SpaCy allows you to try out all sorts of architectures here. It's good to be aware that you can customise your spaCy model to fit your use case. But! One key thing to remember is that while spaCy is highly configurable, your main concern should be to collect high quality training data to start with. As we often emphasize, you want to make sure that your training data reflects real user language and behavior-so your trained model will too.

The fact that you're able to re-use your spaCy model for many use cases in your organization is great news though. It might enable lessons learned from your customer service desk to be re-used in your virtual assistant and vise-versa.
Happy hacking!