Version: 3.x
rasa.nlu.tokenizers.tokenizer
Token Objects
class Token()
Used by Tokenizers
which split a single message into multiple Token
s.
__init__
| __init__(text: Text, start: int, end: Optional[int] = None, data: Optional[Dict[Text, Any]] = None, lemma: Optional[Text] = None) -> None
Create a Token
.
Arguments:
text
- The token text.start
- The start index of the token within the entire message.end
- The end index of the token within the entire message.data
- Additional token data.lemma
- An optional lemmatized version of the token text.
set
| set(prop: Text, info: Any) -> None
Set property value.
get
| get(prop: Text, default: Optional[Any] = None) -> Any
Returns token value.
fingerprint
| fingerprint() -> Text
Returns a stable hash for this Token.
Tokenizer Objects
class Tokenizer(GraphComponent, abc.ABC)
Base class for tokenizers.
__init__
| __init__(config: Dict[Text, Any]) -> None
Construct a new tokenizer.
create
| @classmethod
| create(cls, config: Dict[Text, Any], model_storage: ModelStorage, resource: Resource, execution_context: ExecutionContext) -> GraphComponent
Creates a new component (see parent class for full docstring).
tokenize
| @abc.abstractmethod
| tokenize(message: Message, attribute: Text) -> List[Token]
Tokenizes the text of the provided attribute of the incoming message.
process_training_data
| process_training_data(training_data: TrainingData) -> TrainingData
Tokenize all training data.
process
| process(messages: List[Message]) -> List[Message]
Tokenize the incoming messages.