Version: 3.x

rasa.nlu.tokenizers.whitespace_tokenizer

WhitespaceTokenizer Objects

@DefaultV1Recipe.register(
DefaultV1Recipe.ComponentType.MESSAGE_TOKENIZER, is_trainable=False
)
class WhitespaceTokenizer(Tokenizer)

Creates features for entity extraction.

not_supported_languages

| @staticmethod
| not_supported_languages() -> Optional[List[Text]]

The languages that are not supported.

get_default_config

| @staticmethod
| get_default_config() -> Dict[Text, Any]

Returns the component's default config.

__init__

| __init__(config: Dict[Text, Any]) -> None

Initialize the tokenizer.

create

| @classmethod
| create(cls, config: Dict[Text, Any], model_storage: ModelStorage, resource: Resource, execution_context: ExecutionContext) -> WhitespaceTokenizer

Creates a new component (see parent class for full docstring).

remove_emoji

| remove_emoji(text: Text) -> Text

Remove emoji if the full text, aka token, matches the emoji regex.