Featurizer using ConveRT model.
Loads the ConveRT(https://github.com/PolyAI-LDN/polyai-models#convert) model from TFHub and computes sentence and sequence level feature representations for dense featurizable attributes of each message object.
Components that should be included in the pipeline before this component.
The component's default config (see parent class for full docstring).
Packages needed to be installed.
Determines which languages this component can work with.
Returns: A list of supported languages, or
None to signify all are supported.
Creates a new component (see parent class for full docstring).
name- An identifier for this featurizer.
config- The configuration.
Validates that the component is configured properly.
Featurize all message attributes in the training data with the ConveRT model.
training_data- Training data to be featurized
featurized training data
Featurize an incoming message with the ConveRT model.
messages- Message to be featurized
Tokenize the text using the ConveRT model.
ConveRT adds a special char in front of (some) words and splits words into sub-words. To ensure the entity start and end values matches the token values, reuse the tokens that are already assigned to the message. If individual tokens are split up into multiple tokens, add this information to the respected tokens.