Creates a sequence of token counts features based on sklearn's
All tokens which consist only of digits (e.g. 123 and 99 but not ab12d) will be represented by a single feature.
analyzer to 'char_wb'
to use the idea of Subword Semantic Hashing
Components that should be included in the pipeline before this component.
Returns the component's default config.
Any extra python dependencies required for this component to run.
Constructs a new count vectorizer using the sklearn framework.
Creates a new untrained component (see parent class for full docstring).
Trains the featurizer.
Take parameters from config and construct a new count vectorizer using the sklearn framework.
Processes the training examples in the given training data in-place.
training_data- the training data
same training data after processing
Processes incoming message and compute and set features.
Persist this model into the passed directory.
Returns the metadata necessary to load the model again.
Loads trained component (see parent class for full docstring).
Validates that the component is configured properly.