notice

This is unreleased documentation for Rasa Open Source Documentation Master/Unreleased version.
For the latest released documentation, see the latest version (2.8.x).

Version: Master/Unreleased

rasa.nlu.featurizers.dense_featurizer.convert_featurizer

ConveRTFeaturizerGraphComponent Objects

class ConveRTFeaturizerGraphComponent(DenseFeaturizer2, GraphComponent)

Featurizer using ConveRT model.

Loads the ConveRT(https://github.com/PolyAI-LDN/polyai-models#convert) model from TFHub and computes sentence and sequence level feature representations for dense featurizable attributes of each message object.

get_default_config

@staticmethod
def get_default_config() -> Dict[Text, Any]

The component's default config (see parent class for full docstring).

required_packages

@staticmethod
def required_packages() -> List[Text]

Packages needed to be installed.

supported_languages

@staticmethod
def supported_languages() -> Optional[List[Text]]

Determines which languages this component can work with.

Returns: A list of supported languages, or None to signify all are supported.

create

@classmethod
def create(cls, config: Dict[Text, Any], model_storage: ModelStorage, resource: Resource, execution_context: ExecutionContext) -> ConveRTFeaturizerGraphComponent

Creates a new component (see parent class for full docstring).

__init__

def __init__(name: Text, config: Dict[Text, Any]) -> None

Initializes a ConveRTFeaturizer.

Arguments:

  • name - An identifier for this featurizer.
  • config - The configuration.

validate_config

@classmethod
def validate_config(cls, config: Dict[Text, Any]) -> None

Validates that the component is configured properly.

validate_compatibility_with_tokenizer

@classmethod
def validate_compatibility_with_tokenizer(cls, config: Dict[Text, Any], tokenizer_type: Type[Tokenizer]) -> None

Validates that the featurizer is compatible with the given tokenizer.

process_training_data

def process_training_data(training_data: TrainingData) -> TrainingData

Featurize all message attributes in the training data with the ConveRT model.

Arguments:

  • training_data - Training data to be featurized

Returns:

featurized training data

process

def process(messages: List[Message]) -> List[Message]

Featurize an incoming message with the ConveRT model.

Arguments:

  • messages - Message to be featurized

tokenize

def tokenize(message: Message, attribute: Text) -> List[Token]

Tokenize the text using the ConveRT model.

ConveRT adds a special char in front of (some) words and splits words into sub-words. To ensure the entity start and end values matches the token values, reuse the tokens that are already assigned to the message. If individual tokens are split up into multiple tokens, add this information to the respected tokens.