Warning: This document is for the development version of Rasa. The latest version is 1.10.7.

Custom NLU Components

You can create a custom component to perform a specific task which NLU doesn’t currently offer (for example, sentiment analysis). Below is the specification of the rasa.nlu.components.Component class with the methods you’ll need to implement.

Note

There is a detailed tutorial on building custom components here.

You can add a custom component to your pipeline by adding the module path. So if you have a module called sentiment containing a SentimentAnalyzer class:

pipeline:
- name: "sentiment.SentimentAnalyzer"

Also be sure to read the section on the Component Lifecycle.

To get started, you can use this skeleton that contains the most important methods that you should implement:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
import typing
from typing import Any, Optional, Text, Dict, List, Type

from rasa.nlu.components import Component
from rasa.nlu.config import RasaNLUModelConfig
from rasa.nlu.training_data import Message, TrainingData

if typing.TYPE_CHECKING:
    from rasa.nlu.model import Metadata


class MyComponent(Component):
    """A new component"""

    # Which components are required by this component.
    # Listed components should appear before the component itself in the pipeline.
    @classmethod
    def required_components(cls) -> List[Type[Component]]:
        """Specify which components need to be present in the pipeline."""

        return []

    # Defines the default configuration parameters of a component
    # these values can be overwritten in the pipeline configuration
    # of the model. The component should choose sensible defaults
    # and should be able to create reasonable results with the defaults.
    defaults = {}

    # Defines what language(s) this component can handle.
    # This attribute is designed for instance method: `can_handle_language`.
    # Default value is None which means it can handle all languages.
    # This is an important feature for backwards compatibility of components.
    language_list = None

    def __init__(self, component_config: Optional[Dict[Text, Any]] = None) -> None:
        super().__init__(component_config)

    def train(
        self,
        training_data: TrainingData,
        config: Optional[RasaNLUModelConfig] = None,
        **kwargs: Any,
    ) -> None:
        """Train this component.

        This is the components chance to train itself provided
        with the training data. The component can rely on
        any context attribute to be present, that gets created
        by a call to :meth:`components.Component.pipeline_init`
        of ANY component and
        on any context attributes created by a call to
        :meth:`components.Component.train`
        of components previous to this one."""
        pass

    def process(self, message: Message, **kwargs: Any) -> None:
        """Process an incoming message.

        This is the components chance to process an incoming
        message. The component can rely on
        any context attribute to be present, that gets created
        by a call to :meth:`components.Component.pipeline_init`
        of ANY component and
        on any context attributes created by a call to
        :meth:`components.Component.process`
        of components previous to this one."""
        pass

    def persist(self, file_name: Text, model_dir: Text) -> Optional[Dict[Text, Any]]:
        """Persist this component to disk for future loading."""

        pass

    @classmethod
    def load(
        cls,
        meta: Dict[Text, Any],
        model_dir: Optional[Text] = None,
        model_metadata: Optional["Metadata"] = None,
        cached_component: Optional["Component"] = None,
        **kwargs: Any,
    ) -> "Component":
        """Load this component from file."""

        if cached_component:
            return cached_component
        else:
            return cls(meta)

Note

If you create a custom tokenizer you should implement the methods of rasa.nlu.tokenizers.tokenizer.Tokenizer. The train and process methods are already implemented and you simply need to overwrite the tokenize method.

Note

If you create a custom featurizer you can return two different kind of features: sequence features and sentence features. The sequence features are a matrix of size (number-of-tokens x feature-dimension), e.g. the matrix contains a feature vector for every token in the sequence. The sentence features are represented by a matrix of size (1 x feature-dimension).

Component

class rasa.nlu.components.Component(component_config=None)

A component is a message processing unit in a pipeline.

Components are collected sequentially in a pipeline. Each component is called one after another. This holds for initialization, training, persisting and loading the components. If a component comes first in a pipeline, its methods will be called first.

E.g. to process an incoming message, the process method of each component will be called. During the processing (as well as the training, persisting and initialization) components can pass information to other components. The information is passed to other components by providing attributes to the so called pipeline context. The pipeline context contains all the information of the previous components a component can use to do its own processing. For example, a featurizer component can provide features that are used by another component down the pipeline to do intent classification.

classmethod required_components()

Specify which components need to be present in the pipeline.

Returns

The list of class names of required components.

Return type

List[Type[Component]]

classmethod required_packages()

Specify which python packages need to be installed.

E.g. ["spacy"]. More specifically, these should be importable python package names e.g. sklearn and not package names in the dependencies sense e.g. scikit-learn

This list of requirements allows us to fail early during training if a required package is not installed.

Returns

The list of required package names.

Return type

List[str]

classmethod create(component_config, config)

Creates this component (e.g. before a training is started).

Method can access all configuration parameters.

Parameters
  • component_config – The components configuration parameters.

  • config – The model configuration parameters.

Returns

The created component.

Return type

Component

provide_context()

Initialize this component for a new pipeline.

This function will be called before the training is started and before the first message is processed using the interpreter. The component gets the opportunity to add information to the context that is passed through the pipeline during training and message parsing. Most components do not need to implement this method. It’s mostly used to initialize framework environments like MITIE and spacy (e.g. loading word vectors for the pipeline).

Returns

The updated component configuration.

Return type

Optional[Dict[str, Any]]

train(training_data, config=None, **kwargs)

Train this component.

This is the components chance to train itself provided with the training data. The component can rely on any context attribute to be present, that gets created by a call to rasa.nlu.components.Component.create() of ANY component and on any context attributes created by a call to rasa.nlu.components.Component.train() of components previous to this one.

Parameters
  • training_data – The rasa.nlu.training_data.training_data.TrainingData.

  • config – The model configuration parameters.

Return type

None

process(message, **kwargs)

Process an incoming message.

This is the components chance to process an incoming message. The component can rely on any context attribute to be present, that gets created by a call to rasa.nlu.components.Component.create() of ANY component and on any context attributes created by a call to rasa.nlu.components.Component.process() of components previous to this one.

Parameters

message – The rasa.nlu.training_data.message.Message to process.

Return type

None

persist(file_name, model_dir)

Persist this component to disk for future loading.

Parameters
  • file_name – The file name of the model.

  • model_dir – The directory to store the model to.

Returns

An optional dictionary with any information about the stored model.

Return type

Optional[Dict[str, Any]]

prepare_partial_processing(pipeline, context)

Sets the pipeline and context used for partial processing.

The pipeline should be a list of components that are previous to this one in the pipeline and have already finished their training (and can therefore be safely used to process messages).

Parameters
  • pipeline – The list of components.

  • context – The context of processing.

Return type

None

partially_process(message)

Allows the component to process messages during training (e.g. external training data).

The passed message will be processed by all components previous to this one in the pipeline.

Parameters

message – The rasa.nlu.training_data.message.Message to process.

Returns

The processed rasa.nlu.training_data.message.Message.

Return type

Message

classmethod can_handle_language(language)

Check if component supports a specific language.

This method can be overwritten when needed. (e.g. dynamically determine which language is supported.)

Parameters

language – The language to check.

Returns

True if component can handle specific language, False otherwise.

Return type

bool