In February this year, spaCy 3.0 was released. This was a major release with many new features, including new pre-trained models. Rasa Open Source 2.5 now includes support for this new version of spaCy, which brings many new features to the Rasa community. There are, however, also some minor breaking changes to be aware of. In this blog post, we'd like to highlight the new features and explain what this upgrade means for Rasa users.
New Languages, New Models
Over the past year, the spaCy team has upgraded its models. The parsers, taggers, and entity extractors have all been upgraded. That means that if you already use a SpacyTokenizer, SpacyFeaturizer, or SpacyEntityExtractor in your pipeline, you'll benefit from these recent improvements.
The spaCy team has also been working with collaborators from around the world to support more languages. As of today, spaCy includes pre-trained models for 16 languages. Some of these languages have been introduced recently with the spaCy 3.0 upgrade, including Chinese, Japanese, Russian, and Polish. By making sure that Rasa supports the latest version of spaCy, we also ensure that any newly added languages are supported in Rasa.
SpaCy doesn't just offer new language models; it also provides an entire ecosystem of tools. Some of these tools come in the form of specialized pre-trained models.
The standard spaCy models can detect typical entities like organizations, locations, and numeric references. But the ecosystem also provides specialized models that focus on a domain. The blackstone project delivers spaCy models that have been pre-trained on legal texts, and the scispacy project supports pipelines for scientific/biomedical texts. By keeping Rasa up to date with spaCy, we also allow our users to make use of these specialized models for their assistants.
Another key benefit is that spaCy models can also be pre-trained by Rasa users themselves! For example, let's say you've got an internal spaCy model for banking that has been pre-trained for the many use-cases within your organization. You can re-use such a spaCy model in your virtual assistant as well.
There is a minor, albeit breaking, change to be aware of. In spaCy 2.x, you could link a model to a language name via the
spacy link command. This command has been deprecated in spaCy 3.x, which means that you will need to refer to spaCy models explicitly.
Previously, you could run
spacy link en en_core_web_md and then we would pick up the correct model from the `language` parameter.
language: en pipeline: - name: SpacyNLP
As of spaCy 3.0, you'll need to be explicit in
config.yml by adding the model name as a parameter in the
language: en pipeline: - name: SpacyNLP model: en_core_web_md
To make the transition easier, Rasa will try to help you if you still use the old configuration. Whenever a compatible language (like `en`) is configured for the entire pipeline, Rasa will assume a medium model on your behalf. This fallback behavior is temporary and will be removed in Rasa 3.0.0, so it's best to start specifying models explicitly.
This blog post discussed the new features from the spaCy 3.0 ecosystem available in Rasa 2.5. There is a small breaking change to keep in mind, but we're excited to see what these new features bring to the Rasa community. We're especially keen to keep an eye out for any new additions from spaCy.