The research team at Rasa is building and researching tools that cover many use-cases but at the same time they do not have access to all of the data that our users have. This means that we're limited in the experiments that we might do.
This made us wonder. Would it help our community if instead of asking for data to be shared we might instead share more tools? This way no data needs to be shared but we can still empower our users by allowing them to customise their machine learning configuration.
This is why we're happy to announce a new project on github; rasa nlu examples. The goal of this library is to host more experimental rasa nlu components that are supported by the community. This gives us the opportunity to share some experimental ideas but it also means that users can contribute and share their components.
The library is still small but already comes with useful components. The printer component from a previous blogpost is currently supported and we also offer two new sources of word embeddings; fasttext embeddings (available in 157 languages) as well as the lightweight byte-pair embeddings (available in 275 languages, including some multi-language embeddings).
Using the NLU example components is easy. You can install the repo using pip via github.
pip install git+https://github.com/RasaHQ/rasa-nlu-examples
From here you can add components to your pipeline. The pipeline below adds French Byte-Pair embeddings to the pipeline.
language: fr pipeline: - name: WhitespaceTokenizer - name: CountVectorsFeaturizer OOV_token: oov.txt token_pattern: (?u)\b\w+\b - name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4 - name: rasa_nlu_examples.featurizers.dense.BytePairFeaturizer lang: fr vs: 200000 dim: 300 - name: DIETClassifier epochs: 200
You can find more details in the benchmarking guide.
The goal of the library is to be a `contrib`-like library. We'll be able to allow for more experimental features because the example components won't need to go through the same vetting process our Rasa Open Source library. There will still be a small review process to make sure that the tools that get added are useful to the Rasa community and we'll also make sure that the tools receive unit tests.
Another goal of the library is to offer examples of implemented components such that it is easier for you to write your own. We hope this library will inspire folks to contribute the ir own ideas to the growing Rasa ecosystem and we'd love to hear what components you can come up with.
You can find the documentation here. Happy hacking!