rasa.utils.train_utils
rank_and_mask
Computes a ranking of the given confidences.
First, it computes a list containing the indices that would sort all the given
confidences in decreasing order.
If a ranking_length
is specified, then only the indices for the ranking_length
largest confidences will be returned and all other confidences (i.e. whose indices
we do not return) will be masked by setting them to 0.
Moreover, if renormalize
is set to True
, then the confidences will
additionally be renormalised by dividing them by their sum.
We assume that the given confidences sum up to 1 and, if the
ranking_length
is 0 or larger than the given number of confidences,
we set the ranking_length
to the number of confidences.
Hence, in this case the confidences won't be modified.
Arguments:
confidences
- a 1-d array of confidences that are non-negative and sum up to 1ranking_length
- the size of the ranking to be computed. If set to 0 or something larger than the number of given confidences, then this is set to the exact number of given confidences.renormalize
- determines whether the masked confidences should be renormalised. return_indices:
Returns:
indices of the top ranking_length
confidences and an array of the same
shape as the given confidences that contains the possibly masked and
renormalized confidence values
update_similarity_type
If SIMILARITY_TYPE is set to 'auto', update the SIMILARITY_TYPE depending on the LOSS_TYPE.
Arguments:
config
- model configurationReturns
- updated model configuration
align_token_features
Align token features to match tokens.
ConveRTFeaturizer and LanguageModelFeaturizer might split up tokens into sub-tokens. We need to take the mean of the sub-token vectors and take that as token vector.
Arguments:
list_of_tokens
- tokens for examplesin_token_features
- token features from ConveRTshape
- shape of feature matrix
Returns:
Token features.
update_evaluation_parameters
If EVAL_NUM_EPOCHS is set to -1, evaluate at the end of the training.
Arguments:
config
- model configurationReturns
- updated model configuration
load_tf_hub_model
Load model from cache if possible, otherwise from TFHub.
check_deprecated_options
Update the config according to changed config params.
If old model configuration parameters are present in the provided config, replace them with the new parameters and log a warning.
Arguments:
config
- model configurationReturns
- updated model configuration
check_core_deprecated_options
Update the core config according to changed config params.
If old model configuration parameters are present in the provided config, replace them with the new parameters and log a warning.
Arguments:
config
- model configurationReturns
- updated model configuration
entity_label_to_tags
Convert the output predictions for entities to the actual entity tags.
Arguments:
model_predictions
- the output predictions using the entity tag indicesentity_tag_specs
- the entity tag specificationsbilou_flag
- if 'True', the BILOU tagging schema was usedprediction_index
- the index in the batch of predictions to use for entity extraction
Returns:
A map of entity tag type, e.g. entity, role, group, to actual entity tags and confidences.
create_data_generators
Create data generators for train and optional validation data.
Arguments:
model_data
- The model data to use.batch_sizes
- The batch size(s).epochs
- The number of epochs to train.batch_strategy
- The batch strategy to use.eval_num_examples
- Number of examples to use for validation data.random_seed
- The random seed.shuffle
- Whether to shuffle data inside the data generator.drop_small_last_batch
- whether to drop the last batch if it has fewer than half a batch size of examples
Returns:
The training data generator and optional validation data generator.
create_common_callbacks
Create common callbacks.
The following callbacks are created:
- RasaTrainingLogger callback
- Optional TensorBoard callback
- Optional RasaModelCheckpoint callback
Arguments:
epochs
- the number of epochs to traintensorboard_log_dir
- optional directory that should be used for tensorboardtensorboard_log_level
- defines when training metrics for tensorboard should be logged. Valid values: 'epoch' and 'batch'.checkpoint_dir
- optional directory that should be used for model checkpointing
Returns:
A list of callbacks.
update_confidence_type
Set model confidence to auto if margin loss is used.
Option auto
is reserved for margin loss type. It will be removed once margin loss
is deprecated.
Arguments:
component_config
- model configuration
Returns:
updated model configuration
validate_configuration_settings
Validates that combination of parameters in the configuration are correctly set.
Arguments:
component_config
- Configuration to validate.
init_split_entities
Initialise the behaviour for splitting entities by comma (or not).
Returns:
Defines desired behaviour for splitting specific entity types and default behaviour for splitting any entity types for which no behaviour is defined.