notice

This is unreleased documentation for Rasa Open Source Documentation Master/Unreleased version.
For the latest released documentation, see the latest version (2.x).

Version: Master/Unreleased

rasa.nlu.tokenizers.whitespace_tokenizer

WhitespaceTokenizerGraphComponent Objects

class WhitespaceTokenizerGraphComponent(TokenizerGraphComponent)

Creates features for entity extraction.

not_supported_languages

@staticmethod
def not_supported_languages() -> Optional[List[Text]]

The languages that are not supported.

get_default_config

@staticmethod
def get_default_config() -> Dict[Text, Any]

Returns the component's default config.

__init__

def __init__(config: Dict[Text, Any]) -> None

Initialize the tokenizer.

create

@classmethod
def create(cls, config: Dict[Text, Any], model_storage: ModelStorage, resource: Resource, execution_context: ExecutionContext) -> "WhitespaceTokenizerGraphComponent"

Creates a new component (see parent class for full docstring).

remove_emoji

def remove_emoji(text: Text) -> Text

Remove emoji if the full text, aka token, matches the emoji regex.