notice
This is unreleased documentation for Rasa Documentation Main/Unreleased version.
For the latest released documentation, see the latest version (3.x).
rasa.nlu.featurizers.dense_featurizer.convert_featurizer
ConveRTFeaturizer Objects
Featurizer using ConveRT model.
Loads the ConveRT(https://github.com/PolyAI-LDN/polyai-models#convert) model from TFHub and computes sentence and sequence level feature representations for dense featurizable attributes of each message object.
required_components
Components that should be included in the pipeline before this component.
get_default_config
The component's default config (see parent class for full docstring).
required_packages
Packages needed to be installed.
supported_languages
Determines which languages this component can work with.
Returns: A list of supported languages, or None
to signify all are supported.
create
Creates a new component (see parent class for full docstring).
__init__
Initializes a ConveRTFeaturizer
.
Arguments:
name
- An identifier for this featurizer.config
- The configuration.
validate_config
Validates that the component is configured properly.
process_training_data
Featurize all message attributes in the training data with the ConveRT model.
Arguments:
training_data
- Training data to be featurized
Returns:
featurized training data
process
Featurize an incoming message with the ConveRT model.
Arguments:
messages
- Message to be featurized
tokenize
Tokenize the text using the ConveRT model.
ConveRT adds a special char in front of (some) words and splits words into sub-words. To ensure the entity start and end values matches the token values, reuse the tokens that are already assigned to the message. If individual tokens are split up into multiple tokens, add this information to the respected tokens.