Version: 3.x

rasa.nlu.tokenizers.whitespace_tokenizer

WhitespaceTokenizer Objects

@DefaultV1Recipe.register(
DefaultV1Recipe.ComponentType.MESSAGE_TOKENIZER, is_trainable=False
)
class WhitespaceTokenizer(Tokenizer)

Creates features for entity extraction.

not_supported_languages

@staticmethod
def not_supported_languages() -> Optional[List[Text]]

The languages that are not supported.

get_default_config

@staticmethod
def get_default_config() -> Dict[Text, Any]

Returns the component's default config.

__init__

def __init__(config: Dict[Text, Any]) -> None

Initialize the tokenizer.

create

@classmethod
def create(cls, config: Dict[Text, Any], model_storage: ModelStorage,
resource: Resource,
execution_context: ExecutionContext) -> WhitespaceTokenizer

Creates a new component (see parent class for full docstring).

remove_emoji

def remove_emoji(text: Text) -> Text

Remove emoji if the full text, aka token, matches the emoji regex.