notice

This is documentation for Rasa Documentation v2.x, which is no longer actively maintained.
For up-to-date documentation, see the latest version (3.x).

Version: 2.x

rasa.utils.tensorflow.model_data

FeatureArray Objects

class FeatureArray(np.ndarray)

Stores any kind of features ready to be used by a RasaModel.

Next to the input numpy array of features, it also received the number of dimensions of the features. As our features can have 1 to 4 dimensions we might have different number of numpy arrays stacked. The number of dimensions helps us to figure out how to handle this particular feature array. Also, it is automatically determined whether the feature array is sparse or not and the number of units is determined as well.

Subclassing np.array: https://numpy.org/doc/stable/user/basics.subclassing.html

__new__

| __new__(cls, input_array: np.ndarray, number_of_dimensions: int) -> "FeatureArray"

Create and return a new object. See help(type) for accurate signature.

__init__

| __init__(input_array: Any, number_of_dimensions: int, **kwargs: Any) -> None

Initialize. FeatureArray.

Needed in order to avoid 'Invalid keyword argument number_of_dimensions to function FeatureArray.init '

Arguments:

  • input_array - the array that contains features
  • number_of_dimensions - number of dimensions in input_array

__array_finalize__

| __array_finalize__(obj: Any) -> None

This method is called when the system allocates a new array from obj.

Arguments:

  • obj - A subclass (subtype) of ndarray.

__array_ufunc__

| __array_ufunc__(ufunc: Any, method: Text, *inputs: Any, **kwargs: Any) -> Any

Overwrite this method as we are subclassing numpy array.

Arguments:

  • ufunc - The ufunc object that was called.
  • method - A string indicating which Ufunc method was called (one of "call", "reduce", "reduceat", "accumulate", "outer", "inner").
  • *inputs - A tuple of the input arguments to the ufunc.
  • **kwargs - Any additional arguments

Returns:

The result of the operation.

__reduce__

| __reduce__() -> Tuple[Any, Any, Any]

Needed in order to pickle this object.

Returns:

A tuple.

__setstate__

| __setstate__(state: Any, **kwargs: Any) -> None

Sets the state.

Arguments:

  • state - The state argument must be a sequence that contains the following elements version, shape, dtype, isFortan, rawdata.
  • **kwargs - Any additional parameter

FeatureSignature Objects

class FeatureSignature(NamedTuple)

Signature of feature arrays.

Stores the number of units, the type (sparse vs dense), and the number of dimensions of features.

RasaModelData Objects

class RasaModelData()

Data object used for all RasaModels.

It contains all features needed to train the models. 'data' is a mapping of attribute name, e.g. TEXT, INTENT, etc., and feature name, e.g. SENTENCE, SEQUENCE, etc., to a list of feature arrays representing the actual features. 'label_key' and 'label_sub_key' point to the labels inside 'data'. For example, if your intent labels are stored under INTENT -> IDS, 'label_key' would be "INTENT" and 'label_sub_key' would be "IDS".

__init__

| __init__(label_key: Optional[Text] = None, label_sub_key: Optional[Text] = None, data: Optional[Data] = None) -> None

Initializes the RasaModelData object.

Arguments:

  • label_key - the key of a label used for balancing, etc.
  • label_sub_key - the sub key of a label used for balancing, etc.
  • data - the data holding the features

get

| get(key: Text, sub_key: Optional[Text] = None) -> Union[Dict[Text, List[FeatureArray]], List[FeatureArray]]

Get the data under the given keys.

Arguments:

  • key - The key.
  • sub_key - The optional sub key.

Returns:

The requested data.

items

| items() -> ItemsView

Return the items of the data attribute.

Returns:

The items of data.

values

| values() -> Any

Return the values of the data attribute.

Returns:

The values of data.

keys

| keys(key: Optional[Text] = None) -> List[Text]

Return the keys of the data attribute.

Arguments:

  • key - The optional key.

Returns:

The keys of the data.

sort

| sort() -> None

Sorts data according to its keys.

first_data_example

| first_data_example() -> Data

Return the data with just one feature example per key, sub-key.

Returns:

The simplified data.

does_feature_exist

| does_feature_exist(key: Text, sub_key: Optional[Text] = None) -> bool

Check if feature key (and sub-key) is present and features are available.

Arguments:

  • key - The key.
  • sub_key - The optional sub-key.

Returns:

False, if no features for the given keys exists, True otherwise.

does_feature_not_exist

| does_feature_not_exist(key: Text, sub_key: Optional[Text] = None) -> bool

Check if feature key (and sub-key) is present and features are available.

Arguments:

  • key - The key.
  • sub_key - The optional sub-key.

Returns:

True, if no features for the given keys exists, False otherwise.

is_empty

| is_empty() -> bool

Checks if data is set.

number_of_examples

| number_of_examples(data: Optional[Data] = None) -> int

Obtain number of examples in data.

Arguments:

  • data - The data.
  • Raises - A ValueError if number of examples differ for different features.

Returns:

The number of examples in data.

number_of_units

| number_of_units(key: Text, sub_key: Text) -> int

Get the number of units of the given key.

Arguments:

  • key - The key.
  • sub_key - The optional sub-key.

Returns:

The number of units.

add_data

| add_data(data: Data, key_prefix: Optional[Text] = None) -> None

Add incoming data to data.

Arguments:

  • data - The data to add.
  • key_prefix - Optional key prefix to use in front of the key value.

update_key

| update_key(from_key: Text, from_sub_key: Text, to_key: Text, to_sub_key: Text) -> None

Copies the features under the given keys to the new keys and deletes the old.

Arguments:

  • from_key - current feature key
  • from_sub_key - current feature sub-key
  • to_key - new key for feature
  • to_sub_key - new sub-key for feature

add_features

| add_features(key: Text, sub_key: Text, features: Optional[List[FeatureArray]]) -> None

Add list of features to data under specified key.

Should update number of examples.

Arguments:

  • key - The key
  • sub_key - The sub-key
  • features - The features to add.

add_lengths

| add_lengths(key: Text, sub_key: Text, from_key: Text, from_sub_key: Text) -> None

Adds a feature array of lengths of sequences to data under given key.

Arguments:

  • key - The key to add the lengths to
  • sub_key - The sub-key to add the lengths to
  • from_key - The key to take the lengths from
  • from_sub_key - The sub-key to take the lengths from

add_sparse_feature_sizes

| add_sparse_feature_sizes(sparse_feature_sizes: Dict[Text, Dict[Text, List[int]]]) -> None

Adds a dictionary of feature sizes for different attributes.

Arguments:

  • sparse_feature_sizes - a dictionary of attribute that has sparse features to a dictionary of a feature type to a list of different sparse feature sizes.

get_sparse_feature_sizes

| get_sparse_feature_sizes() -> Dict[Text, Dict[Text, List[int]]]

Get feature sizes of the model.

sparse_feature_sizes is a dictionary of attribute that has sparse features to a dictionary of a feature type to a list of different sparse feature sizes.

Returns:

A dictionary of key and sub-key to a list of feature signatures (same structure as the data attribute).

split

| split(number_of_test_examples: int, random_seed: int) -> Tuple["RasaModelData", "RasaModelData"]

Create random hold out test set using stratified split.

Arguments:

  • number_of_test_examples - Number of test examples.
  • random_seed - Random seed.

Returns:

A tuple of train and test RasaModelData.

get_signature

| get_signature(data: Optional[Data] = None) -> Dict[Text, Dict[Text, List[FeatureSignature]]]

Get signature of RasaModelData.

Signature stores the shape and whether features are sparse or not for every key.

Returns:

A dictionary of key and sub-key to a list of feature signatures (same structure as the data attribute).

shuffled_data

| shuffled_data(data: Data) -> Data

Shuffle model data.

Arguments:

  • data - The data to shuffle

Returns:

The shuffled data.

balanced_data

| balanced_data(data: Data, batch_size: int, shuffle: bool) -> Data

Mix model data to account for class imbalance.

This batching strategy puts rare classes approximately in every other batch, by repeating them. Mimics stratified batching, but also takes into account that more populated classes should appear more often.

Arguments:

  • data - The data.
  • batch_size - The batch size.
  • shuffle - Boolean indicating whether to shuffle the data or not.

Returns:

The balanced data.