A featurizer that uses transformer-based language models.
This component loads a pre-trained language model from the Transformers library (https://github.com/huggingface/transformers) including BERT, GPT, GPT-2, xlnet, distilbert, and roberta. It also tokenizes and featurizes the featurizable dense attributes of each message.
Components that should be included in the pipeline before this component.
Initializes the featurizer with the model in the config.
Returns LanguageModelFeaturizer's default config.
Validates the configuration.
Creates a LanguageModelFeaturizer.
Loads the model specified in the config.
Returns the extra python dependencies required.
Computes tokens and dense features for each message in training data.
training_data- NLU training data to be tokenized and featurized
config- NLU pipeline config consisting of all components.
Processes messages by computing tokens and dense features.