notice
This is unreleased documentation for Rasa Open Source Documentation Master/Unreleased version.
For the latest released documentation, see the latest version (2.4.x).
Version: Master/Unreleased
rasa.nlu.tokenizers.whitespace_tokenizer
WhitespaceTokenizer Objects
class WhitespaceTokenizer(Tokenizer)
__init__
| __init__(component_config: Dict[Text, Any] = None) -> None
Construct a new tokenizer using the WhitespaceTokenizer framework.
get_emoji_regex
| @staticmethod
| get_emoji_regex() -> Pattern
Gets regex to detect emojis in the training data.
remove_emoji
| remove_emoji(text: Text) -> Text
Remove emoji if the full text, aka token, matches the emoji regex.