notice

This is unreleased documentation for Rasa Documentation Main/Unreleased version.
For the latest released documentation, see the latest version (3.x).

Version: Main/Unreleased

rasa.utils.tensorflow.layers

SparseDropout Objects

class SparseDropout(tf.keras.layers.Dropout)

Applies Dropout to the input.

Dropout consists in randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent overfitting.

Arguments:

  • rate - Fraction of the input units to drop (between 0 and 1).

call

def call(inputs: tf.SparseTensor,
training: Optional[Union[tf.Tensor, bool]] = None) -> tf.SparseTensor

Apply dropout to sparse inputs.

Arguments:

  • inputs - Input sparse tensor (of any rank).
  • training - Indicates whether the layer should behave in training mode (adding dropout) or in inference mode (doing nothing).

Returns:

Output of dropout layer.

Raises:

A ValueError if inputs is not a sparse tensor

DenseForSparse Objects

class DenseForSparse(tf.keras.layers.Dense)

Dense layer for sparse input tensor.

Just your regular densely-connected NN layer but for sparse tensors.

Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).

Note: If the input to the layer has a rank greater than 2, then it is flattened prior to the initial dot product with kernel.

Arguments:

  • units - Positive integer, dimensionality of the output space.

  • activation - Activation function to use. If you don't specify anything, no activation is applied (ie. "linear" activation: output = activation(dot(input, kernel) + bias)1).

  • use_bias - Indicates whether the layer uses a bias vector.

  • output = activation(dot(input, kernel) + bias)3 - Initializer for the kernel weights matrix.

  • output = activation(dot(input, kernel) + bias)5 - Initializer for the bias vector.

  • output = activation(dot(input, kernel) + bias)6 - regularization factor

  • output = activation(dot(input, kernel) + bias)7 - Regularizer function applied to the bias vector.

  • output = activation(dot(input, kernel) + bias)8 - Regularizer function applied to the output of the layer (its "activation")..

  • output = activation(dot(input, kernel) + bias)9 - Constraint function applied to the kernel weights matrix.

  • activation1 - Constraint function applied to the bias vector.

    Input shape: N-D tensor with shape: activation2. The most common situation would be a 2D input with shape activation3.

    Output shape: N-D tensor with shape: activation4. For instance, for a 2D input with shape activation3, the output would have shape activation6.

get_units

def get_units() -> int

Returns number of output units.

get_kernel

def get_kernel() -> tf.Tensor

Returns kernel tensor.

get_bias

def get_bias() -> Union[tf.Tensor, None]

Returns bias tensor.

get_feature_type

def get_feature_type() -> Union[Text, None]

Returns a feature type of the data that's fed to the layer.

In order to correctly return a feature type, the function heavily relies on the name of DenseForSparse layer to contain the feature type. Acceptable values of feature types are FEATURE_TYPE_SENTENCE and FEATURE_TYPE_SEQUENCE.

Returns:

feature type of dense layer.

get_attribute

def get_attribute() -> Union[Text, None]

Returns the attribute for which this layer was constructed.

For example: TEXT, LABEL, etc.

In order to correctly return an attribute, the function heavily relies on the name of DenseForSparse layer being in the following format: f"sparseto_dense.{attribute}{feature_type}".

Returns:

attribute of the layer.

call

def call(inputs: tf.SparseTensor) -> tf.Tensor

Apply dense layer to sparse inputs.

Arguments:

  • inputs - Input sparse tensor (of any rank).

Returns:

Output of dense layer.

Raises:

A ValueError if inputs is not a sparse tensor

RandomlyConnectedDense Objects

class RandomlyConnectedDense(tf.keras.layers.Dense)

Layer with dense ouputs that are connected to a random subset of inputs.

RandomlyConnectedDense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True). It creates kernel_mask to set a fraction of the kernel weights to zero.

Note: If the input to the layer has a rank greater than 2, then it is flattened prior to the initial dot product with kernel.

The output is guaranteed to be dense (each output is connected to at least one input), and no input is disconnected (each input is connected to at least one output).

At output = activation(dot(input, kernel) + bias)1 the number of trainable weights is output = activation(dot(input, kernel) + bias)2. At output = activation(dot(input, kernel) + bias)3 this layer is equivalent to output = activation(dot(input, kernel) + bias)4.

Input shape: N-D tensor with shape: output = activation(dot(input, kernel) + bias)5. The most common situation would be a 2D input with shape output = activation(dot(input, kernel) + bias)6.

Output shape: N-D tensor with shape: output = activation(dot(input, kernel) + bias)7. For instance, for a 2D input with shape output = activation(dot(input, kernel) + bias)6, the output would have shape output = activation(dot(input, kernel) + bias)9.

__init__

def __init__(density: float = 0.2, **kwargs: Any) -> None

Declares instance variables with default values.

Arguments:

  • density - Approximate fraction of trainable weights (between 0 and 1).
  • units - Positive integer, dimensionality of the output space.
  • activation - Activation function to use. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
  • use_bias - Indicates whether the layer uses a bias vector.
  • kernel_initializer - Initializer for the kernel weights matrix.
  • bias_initializer - Initializer for the bias vector.
  • kernel_regularizer - Regularizer function applied to the kernel weights matrix.
  • units0 - Regularizer function applied to the bias vector.
  • units1 - Regularizer function applied to the output of the layer (its "activation")..
  • units2 - Constraint function applied to the kernel weights matrix.
  • units4 - Constraint function applied to the bias vector.

build

def build(input_shape: tf.TensorShape) -> None

Prepares the kernel mask.

Arguments:

  • input_shape - Shape of the inputs to this layer

call

def call(inputs: tf.Tensor) -> tf.Tensor

Processes the given inputs.

Arguments:

  • inputs - What goes into this layer

Returns:

The processed inputs.

Ffnn Objects

class Ffnn(tf.keras.layers.Layer)

Feed-forward network layer.

Arguments:

  • layer_sizes - List of integers with dimensionality of the layers.

  • dropout_rate - Fraction of the input units to drop (between 0 and 1).

  • reg_lambda - regularization factor.

  • density - Approximate fraction of trainable weights (between 0 and 1).

  • layer_name_suffix - Text added to the name of the layers.

    Input shape: N-D tensor with shape: (batch_size, ..., input_dim). The most common situation would be a 2D input with shape (batch_size, input_dim).

    Output shape: N-D tensor with shape: (batch_size, ..., layer_sizes[-1]). For instance, for a 2D input with shape (batch_size, input_dim), the output would have shape (batch_size, layer_sizes[-1]).

call

def call(x: tf.Tensor,
training: Optional[Union[tf.Tensor, bool]] = None) -> tf.Tensor

Apply feed-forward network layer.

Embed Objects

class Embed(tf.keras.layers.Layer)

Dense embedding layer.

Input shape: N-D tensor with shape: (batch_size, ..., input_dim). The most common situation would be a 2D input with shape (batch_size, input_dim).

Output shape: N-D tensor with shape: (batch_size, ..., embed_dim). For instance, for a 2D input with shape (batch_size, input_dim), the output would have shape (batch_size, embed_dim).

__init__

def __init__(embed_dim: int, reg_lambda: float,
layer_name_suffix: Text) -> None

Initialize layer.

Arguments:

  • embed_dim - Dimensionality of the output space.
  • reg_lambda - Regularization factor.
  • layer_name_suffix - Text added to the name of the layers.

call

def call(x: tf.Tensor) -> tf.Tensor

Apply dense layer.

InputMask Objects

class InputMask(tf.keras.layers.Layer)

The layer that masks 15% of the input.

Input shape: N-D tensor with shape: (batch_size, ..., input_dim). The most common situation would be a 2D input with shape (batch_size, input_dim).

Output shape: N-D tensor with shape: (batch_size, ..., input_dim). For instance, for a 2D input with shape (batch_size, input_dim), the output would have shape (batch_size, input_dim).

call

def call(
x: tf.Tensor,
mask: tf.Tensor,
training: Optional[Union[tf.Tensor, bool]] = None
) -> Tuple[tf.Tensor, tf.Tensor]

Randomly mask input sequences.

Arguments:

  • x - Input sequence tensor of rank 3.
  • mask - A tensor representing sequence mask, contains 1 for inputs and 0 for padding.
  • training - Indicates whether the layer should run in training mode (mask inputs) or in inference mode (doing nothing).

Returns:

A tuple of masked inputs and boolean mask.

CRF Objects

class CRF(tf.keras.layers.Layer)

CRF layer.

Arguments:

  • num_tags - Positive integer, number of tags.
  • reg_lambda - regularization factor.
  • name - Optional name of the layer.

call

def call(logits: tf.Tensor,
sequence_lengths: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]

Decodes the highest scoring sequence of tags.

Arguments:

  • logits - A [batch_size, max_seq_len, num_tags] tensor of unary potentials.
  • sequence_lengths - A [batch_size] vector of true sequence lengths.

Returns:

A [batch_size, max_seq_len] matrix, with dtype tf.int32. Contains the highest scoring tag indices. A [batch_size, max_seq_len] matrix, with dtype tf.float32. Contains the confidence values of the highest scoring tag indices.

loss

def loss(logits: tf.Tensor, tag_indices: tf.Tensor,
sequence_lengths: tf.Tensor) -> tf.Tensor

Computes the log-likelihood of tag sequences in a CRF.

Arguments:

  • logits - A [batch_size, max_seq_len, num_tags] tensor of unary potentials to use as input to the CRF layer.
  • tag_indices - A [batch_size, max_seq_len] matrix of tag indices for which we compute the log-likelihood.
  • sequence_lengths - A [batch_size] vector of true sequence lengths.

Returns:

Negative mean log-likelihood of all examples, given the sequence of tag indices.

f1_score

def f1_score(tag_ids: tf.Tensor, pred_ids: tf.Tensor,
mask: tf.Tensor) -> tf.Tensor

Calculates f1 score for train predictions.

DotProductLoss Objects

class DotProductLoss(tf.keras.layers.Layer)

Abstract dot-product loss layer class.

Idea based on StarSpace paper: http://arxiv.org/abs/1709.03856

Implements similarity methods

  • sim (computes a similarity between vectors)
  • get_similarities_and_confidences_from_embeddings (calls sim and also computes confidence values)

Specific loss functions (single- or multi-label) must be implemented in child classes.

__init__

def __init__(num_candidates: int,
scale_loss: bool = False,
constrain_similarities: bool = True,
model_confidence: Text = SOFTMAX,
similarity_type: Text = INNER,
name: Optional[Text] = None,
**kwargs: Any)

Declares instance variables with default values.

Arguments:

  • num_candidates - Number of labels besides the positive one. Depending on whether single- or multi-label loss is implemented (done in sub-classes), these can be all negative example labels, or a mixture of negative and further positive labels, respectively.
  • scale_loss - Boolean, if True scale loss inverse proportionally to the confidence of the correct prediction.
  • constrain_similarities - Boolean, if True applies sigmoid on all similarity terms and adds to the loss function to ensure that similarity values are approximately bounded. Used inside _loss_cross_entropy() only.
  • model_confidence - Normalization of confidence values during inference. Currently, the only possible value is SOFTMAX.
  • similarity_type - Similarity measure to use, either cosine or inner.
  • scale_loss0 - Optional name of the layer.

Raises:

  • scale_loss1 - When similarity_type is not one of scale_loss3 or scale_loss4.

sim

def sim(a: tf.Tensor,
b: tf.Tensor,
mask: Optional[tf.Tensor] = None) -> tf.Tensor

Calculates similarity between a and b.

Operates on the last dimension. When a and b are vectors, then sim computes either the dot-product, or the cosine of the angle between a and b, depending on self.similarity_type. Specifically, when the similarity type is INNER, then we compute the scalar product a . b. When the similarity type is b0, we compute b1, i.e. the cosine of the angle between a and b.

Arguments:

  • a - Any float tensor
  • b - Any tensor of the same shape and type as a
  • b7 - Mask (should contain 1s for inputs and 0s for padding). Note, that b8 should hold.

Returns:

Similarities between vectors in a and b.

get_similarities_and_confidences_from_embeddings

def get_similarities_and_confidences_from_embeddings(
input_embeddings: tf.Tensor,
label_embeddings: tf.Tensor,
mask: Optional[tf.Tensor] = None) -> Tuple[tf.Tensor, tf.Tensor]

Computes similary between input and label embeddings and model's confidence.

First compute the similarity from embeddings and then apply an activation function if needed to get the confidence.

Arguments:

  • input_embeddings - Embeddings of input.
  • label_embeddings - Embeddings of labels.
  • mask - Mask (should contain 1s for inputs and 0s for padding). Note, that len(mask.shape) == len(a.shape) - 1 should hold.

Returns:

similarity between input and label embeddings and model's prediction confidence for each label.

call

def call(*args: Any, **kwargs: Any) -> Tuple[tf.Tensor, tf.Tensor]

Layer's logic - to be implemented in child class.

apply_mask_and_scaling

def apply_mask_and_scaling(loss: tf.Tensor,
mask: Optional[tf.Tensor]) -> tf.Tensor

Scales the loss and applies the mask if necessary.

Arguments:

  • loss - The loss tensor
  • mask - (Optional) A mask to multiply with the loss

Returns:

The scaled loss, potentially averaged over the sequence dimension.

SingleLabelDotProductLoss Objects

class SingleLabelDotProductLoss(DotProductLoss)

Single-label dot-product loss layer.

This loss layer assumes that only one output (label) is correct for any given input.

__init__

def __init__(num_candidates: int,
scale_loss: bool = False,
constrain_similarities: bool = True,
model_confidence: Text = SOFTMAX,
similarity_type: Text = INNER,
name: Optional[Text] = None,
loss_type: Text = CROSS_ENTROPY,
mu_pos: float = 0.8,
mu_neg: float = -0.2,
use_max_sim_neg: bool = True,
neg_lambda: float = 0.5,
same_sampling: bool = False,
**kwargs: Any) -> None

Declares instance variables with default values.

Arguments:

  • num_candidates - Positive integer, the number of incorrect labels; the algorithm will minimize their similarity to the input.
  • loss_type - The type of the loss function, either cross_entropy or margin.
  • mu_pos - Indicates how similar the algorithm should try to make embedding vectors for correct labels; should be 0.0 < ... < 1.0 for cosine similarity type.
  • mu_neg - Maximum negative similarity for incorrect labels, should be -1.0 < ... < 1.0 for cosine similarity type.
  • use_max_sim_neg - If True the algorithm only minimizes maximum similarity over incorrect intent labels, used only if loss_type is set to margin.
  • loss_type2 - The scale of how important it is to minimize the maximum similarity between embeddings of different labels, used only if loss_type is set to margin.
  • loss_type5 - If True scale loss inverse proportionally to the confidence of the correct prediction.
  • loss_type7 - Similarity measure to use, either cosine or loss_type9.
  • cross_entropy0 - Optional name of the layer.
  • cross_entropy1 - If True sample same negative labels for the whole batch.
  • cross_entropy3 - If True and loss_type is cross_entropy, a sigmoid loss term is added to the total loss to ensure that similarity values are approximately bounded.
  • cross_entropy6 - Normalization of confidence values during inference. Currently, the only possible value is cross_entropy7.

call

def call(inputs_embed: tf.Tensor,
labels_embed: tf.Tensor,
labels: tf.Tensor,
all_labels_embed: tf.Tensor,
all_labels: tf.Tensor,
mask: Optional[tf.Tensor] = None) -> Tuple[tf.Tensor, tf.Tensor]

Calculate loss and accuracy.

Arguments:

  • inputs_embed - Embedding tensor for the batch inputs; shape (batch_size, ..., num_features)
  • labels_embed - Embedding tensor for the batch labels; shape (batch_size, ..., num_features)
  • labels - Tensor representing batch labels; shape (batch_size, ..., 1)
  • all_labels_embed - Embedding tensor for the all labels; shape (num_labels, num_features)
  • all_labels - Tensor representing all labels; shape (num_labels, 1)
  • (batch_size, ..., num_features)0 - Optional mask, contains (batch_size, ..., num_features)1 for inputs and (batch_size, ..., num_features)2 for padding; shape (batch_size, ..., num_features)3

Returns:

  • (batch_size, ..., num_features)4 - Total loss.
  • (batch_size, ..., num_features)5 - Training accuracy.

MultiLabelDotProductLoss Objects

class MultiLabelDotProductLoss(DotProductLoss)

Multi-label dot-product loss layer.

This loss layer assumes that multiple outputs (labels) can be correct for any given input. To accomodate for this, we use a sigmoid cross-entropy loss here.

__init__

def __init__(num_candidates: int,
scale_loss: bool = False,
constrain_similarities: bool = True,
model_confidence: Text = SOFTMAX,
similarity_type: Text = INNER,
name: Optional[Text] = None,
**kwargs: Any) -> None

Declares instance variables with default values.

Arguments:

  • num_candidates - Positive integer, the number of candidate labels.
  • scale_loss - If True scale loss inverse proportionally to the confidence of the correct prediction.
  • similarity_type - Similarity measure to use, either cosine or inner.
  • name - Optional name of the layer.
  • constrain_similarities - Boolean, if True applies sigmoid on all similarity terms and adds to the loss function to ensure that similarity values are approximately bounded. Used inside _loss_cross_entropy() only.
  • model_confidence - Normalization of confidence values during inference. Currently, the only possible value is scale_loss0.

call

def call(batch_inputs_embed: tf.Tensor,
batch_labels_embed: tf.Tensor,
batch_labels_ids: tf.Tensor,
all_labels_embed: tf.Tensor,
all_labels_ids: tf.Tensor,
mask: Optional[tf.Tensor] = None) -> Tuple[tf.Tensor, tf.Tensor]

Calculates loss and accuracy.

Arguments:

  • batch_inputs_embed - Embeddings of the batch inputs (e.g. featurized trackers); shape (batch_size, 1, num_features)
  • batch_labels_embed - Embeddings of the batch labels (e.g. featurized intents for IntentTED); shape (batch_size, max_num_labels_per_input, num_features)
  • batch_labels_ids - Batch label indices (e.g. indices of the intents). We assume that indices are integers that run from 0 to (number of labels) - 1. shape (batch_size, max_num_labels_per_input, 1)
  • all_labels_embed - Embeddings for all labels in the domain; shape (batch_size, num_features)
  • (batch_size, 1, num_features)0 - Indices for all labels in the domain; shape (batch_size, 1, num_features)1
  • (batch_size, 1, num_features)2 - Optional sequence mask, which contains (batch_size, 1, num_features)3 for inputs and 0 for padding.

Returns:

  • (batch_size, 1, num_features)5 - Total loss (based on StarSpace http://arxiv.org/abs/1709.03856); scalar
  • (batch_size, 1, num_features)6 - Training accuracy; scalar