notice
This is unreleased documentation for Rasa Documentation Main/Unreleased version.
For the latest released documentation, see the latest version (3.x).
rasa.nlu.featurizers.sparse_featurizer.count_vectors_featurizer
CountVectorsFeaturizer Objects
Creates a sequence of token counts features based on sklearn's CountVectorizer
.
All tokens which consist only of digits (e.g. 123 and 99 but not ab12d) will be represented by a single feature.
Set analyzer
to 'char_wb'
to use the idea of Subword Semantic Hashing
from https://arxiv.org/abs/1810.07150.
required_components
Components that should be included in the pipeline before this component.
get_default_config
Returns the component's default config.
required_packages
Any extra python dependencies required for this component to run.
__init__
Constructs a new count vectorizer using the sklearn framework.
create
Creates a new untrained component (see parent class for full docstring).
train
Trains the featurizer.
Take parameters from config and construct a new count vectorizer using the sklearn framework.
process_training_data
Processes the training examples in the given training data in-place.
Arguments:
training_data
- the training data
Returns:
same training data after processing
process
Processes incoming message and compute and set features.
persist
Persist this model into the passed directory.
Returns the metadata necessary to load the model again.
load
Loads trained component (see parent class for full docstring).
validate_config
Validates that the component is configured properly.