notice
This is documentation for Rasa Documentation v2.x, which is no longer actively maintained.
For up-to-date documentation, see the latest version (3.x).
rasa.utils.tensorflow.model_data
FeatureArray Objects
Stores any kind of features ready to be used by a RasaModel.
Next to the input numpy array of features, it also received the number of dimensions of the features. As our features can have 1 to 4 dimensions we might have different number of numpy arrays stacked. The number of dimensions helps us to figure out how to handle this particular feature array. Also, it is automatically determined whether the feature array is sparse or not and the number of units is determined as well.
Subclassing np.array: https://numpy.org/doc/stable/user/basics.subclassing.html
__new__
Create and return a new object. See help(type) for accurate signature.
__init__
Initialize. FeatureArray.
Needed in order to avoid 'Invalid keyword argument number_of_dimensions to function FeatureArray.init '
Arguments:
input_array
- the array that contains featuresnumber_of_dimensions
- number of dimensions in input_array
__array_finalize__
This method is called when the system allocates a new array from obj.
Arguments:
obj
- A subclass (subtype) of ndarray.
__array_ufunc__
Overwrite this method as we are subclassing numpy array.
Arguments:
ufunc
- The ufunc object that was called.method
- A string indicating which Ufunc method was called (one of "call", "reduce", "reduceat", "accumulate", "outer", "inner").*inputs
- A tuple of the input arguments to the ufunc.**kwargs
- Any additional arguments
Returns:
The result of the operation.
__reduce__
Needed in order to pickle this object.
Returns:
A tuple.
__setstate__
Sets the state.
Arguments:
state
- The state argument must be a sequence that contains the following elements version, shape, dtype, isFortan, rawdata.**kwargs
- Any additional parameter
FeatureSignature Objects
Signature of feature arrays.
Stores the number of units, the type (sparse vs dense), and the number of dimensions of features.
RasaModelData Objects
Data object used for all RasaModels.
It contains all features needed to train the models. 'data' is a mapping of attribute name, e.g. TEXT, INTENT, etc., and feature name, e.g. SENTENCE, SEQUENCE, etc., to a list of feature arrays representing the actual features. 'label_key' and 'label_sub_key' point to the labels inside 'data'. For example, if your intent labels are stored under INTENT -> IDS, 'label_key' would be "INTENT" and 'label_sub_key' would be "IDS".
__init__
Initializes the RasaModelData object.
Arguments:
label_key
- the key of a label used for balancing, etc.label_sub_key
- the sub key of a label used for balancing, etc.data
- the data holding the features
get
Get the data under the given keys.
Arguments:
key
- The key.sub_key
- The optional sub key.
Returns:
The requested data.
items
Return the items of the data attribute.
Returns:
The items of data.
values
Return the values of the data attribute.
Returns:
The values of data.
keys
Return the keys of the data attribute.
Arguments:
key
- The optional key.
Returns:
The keys of the data.
sort
Sorts data according to its keys.
first_data_example
Return the data with just one feature example per key, sub-key.
Returns:
The simplified data.
does_feature_exist
Check if feature key (and sub-key) is present and features are available.
Arguments:
key
- The key.sub_key
- The optional sub-key.
Returns:
False, if no features for the given keys exists, True otherwise.
does_feature_not_exist
Check if feature key (and sub-key) is present and features are available.
Arguments:
key
- The key.sub_key
- The optional sub-key.
Returns:
True, if no features for the given keys exists, False otherwise.
is_empty
Checks if data is set.
number_of_examples
Obtain number of examples in data.
Arguments:
data
- The data.Raises
- A ValueError if number of examples differ for different features.
Returns:
The number of examples in data.
number_of_units
Get the number of units of the given key.
Arguments:
key
- The key.sub_key
- The optional sub-key.
Returns:
The number of units.
add_data
Add incoming data to data.
Arguments:
data
- The data to add.key_prefix
- Optional key prefix to use in front of the key value.
update_key
Copies the features under the given keys to the new keys and deletes the old.
Arguments:
from_key
- current feature keyfrom_sub_key
- current feature sub-keyto_key
- new key for featureto_sub_key
- new sub-key for feature
add_features
Add list of features to data under specified key.
Should update number of examples.
Arguments:
key
- The keysub_key
- The sub-keyfeatures
- The features to add.
add_lengths
Adds a feature array of lengths of sequences to data under given key.
Arguments:
key
- The key to add the lengths tosub_key
- The sub-key to add the lengths tofrom_key
- The key to take the lengths fromfrom_sub_key
- The sub-key to take the lengths from
add_sparse_feature_sizes
Adds a dictionary of feature sizes for different attributes.
Arguments:
sparse_feature_sizes
- a dictionary of attribute that has sparse features to a dictionary of a feature type to a list of different sparse feature sizes.
get_sparse_feature_sizes
Get feature sizes of the model.
sparse_feature_sizes is a dictionary of attribute that has sparse features to a dictionary of a feature type to a list of different sparse feature sizes.
Returns:
A dictionary of key and sub-key to a list of feature signatures (same structure as the data attribute).
split
Create random hold out test set using stratified split.
Arguments:
number_of_test_examples
- Number of test examples.random_seed
- Random seed.
Returns:
A tuple of train and test RasaModelData.
get_signature
Get signature of RasaModelData.
Signature stores the shape and whether features are sparse or not for every key.
Returns:
A dictionary of key and sub-key to a list of feature signatures (same structure as the data attribute).
shuffled_data
Shuffle model data.
Arguments:
data
- The data to shuffle
Returns:
The shuffled data.
balanced_data
Mix model data to account for class imbalance.
This batching strategy puts rare classes approximately in every other batch, by repeating them. Mimics stratified batching, but also takes into account that more populated classes should appear more often.
Arguments:
data
- The data.batch_size
- The batch size.shuffle
- Boolean indicating whether to shuffle the data or not.
Returns:
The balanced data.