Version: 2.0.x

rasa.nlu.featurizers.sparse_featurizer.count_vectors_featurizer

CountVectorsFeaturizer Objects

class CountVectorsFeaturizer(SparseFeaturizer)

Creates a sequence of token counts features based on sklearn's CountVectorizer.

All tokens which consist only of digits (e.g. 123 and 99 but not ab12d) will be represented by a single feature.

Set analyzer to 'char_wb' to use the idea of Subword Semantic Hashing from https://arxiv.org/abs/1810.07150.

__init__

| __init__(component_config: Optional[Dict[Text, Any]] = None, vectorizers: Optional[Dict[Text, "CountVectorizer"]] = None) -> None

Construct a new count vectorizer using the sklearn framework.

train

| train(training_data: TrainingData, cfg: Optional[RasaNLUModelConfig] = None, **kwargs: Any, ,) -> None

Train the featurizer.

Take parameters from config and construct a new count vectorizer using the sklearn framework.

process

| process(message: Message, **kwargs: Any) -> None

Process incoming message and compute and set features

persist

| persist(file_name: Text, model_dir: Text) -> Optional[Dict[Text, Any]]

Persist this model into the passed directory.

Returns the metadata necessary to load the model again.