notice
This is unreleased documentation for Rasa Documentation Main/Unreleased version.
For the latest released documentation, see the latest version (3.x).
rasa.utils.tensorflow.layers
SparseDropout Objects
Applies Dropout to the input.
Dropout consists in randomly setting
a fraction rate
of input units to 0 at each update during training time,
which helps prevent overfitting.
Arguments:
rate
- Fraction of the input units to drop (between 0 and 1).
call
Apply dropout to sparse inputs.
Arguments:
inputs
- Input sparse tensor (of any rank).training
- Indicates whether the layer should behave in training mode (adding dropout) or in inference mode (doing nothing).
Returns:
Output of dropout layer.
Raises:
A ValueError if inputs is not a sparse tensor
DenseForSparse Objects
Dense layer for sparse input tensor.
Just your regular densely-connected NN layer but for sparse tensors.
Dense
implements the operation:
output = activation(dot(input, kernel) + bias)
where activation
is the element-wise activation function
passed as the activation
argument, kernel
is a weights matrix
created by the layer, and bias
is a bias vector created by the layer
(only applicable if use_bias
is True
).
Note: If the input to the layer has a rank greater than 2, then
it is flattened prior to the initial dot product with kernel
.
Arguments:
units
- Positive integer, dimensionality of the output space.activation
- Activation function to use. If you don't specify anything, no activation is applied (ie. "linear" activation:output = activation(dot(input, kernel) + bias)
1).use_bias
- Indicates whether the layer uses a bias vector.output = activation(dot(input, kernel) + bias)
3 - Initializer for thekernel
weights matrix.output = activation(dot(input, kernel) + bias)
5 - Initializer for the bias vector.output = activation(dot(input, kernel) + bias)
6 - regularization factoroutput = activation(dot(input, kernel) + bias)
7 - Regularizer function applied to the bias vector.output = activation(dot(input, kernel) + bias)
8 - Regularizer function applied to the output of the layer (its "activation")..output = activation(dot(input, kernel) + bias)
9 - Constraint function applied to thekernel
weights matrix.activation
1 - Constraint function applied to the bias vector.Input shape: N-D tensor with shape:
activation
2. The most common situation would be a 2D input with shapeactivation
3.Output shape: N-D tensor with shape:
activation
4. For instance, for a 2D input with shapeactivation
3, the output would have shapeactivation
6.
get_units
Returns number of output units.
get_kernel
Returns kernel tensor.
get_bias
Returns bias tensor.
get_feature_type
Returns a feature type of the data that's fed to the layer.
In order to correctly return a feature type, the function heavily relies
on the name of DenseForSparse
layer to contain the feature type.
Acceptable values of feature types are FEATURE_TYPE_SENTENCE
and FEATURE_TYPE_SEQUENCE
.
Returns:
feature type of dense layer.
get_attribute
Returns the attribute for which this layer was constructed.
For example: TEXT, LABEL, etc.
In order to correctly return an attribute, the function heavily relies
on the name of DenseForSparse
layer being in the following format:
f"sparseto_dense.{attribute}{feature_type}".
Returns:
attribute of the layer.
call
Apply dense layer to sparse inputs.
Arguments:
inputs
- Input sparse tensor (of any rank).
Returns:
Output of dense layer.
Raises:
A ValueError if inputs is not a sparse tensor
RandomlyConnectedDense Objects
Layer with dense ouputs that are connected to a random subset of inputs.
RandomlyConnectedDense
implements the operation:
output = activation(dot(input, kernel) + bias)
where activation
is the element-wise activation function
passed as the activation
argument, kernel
is a weights matrix
created by the layer, and bias
is a bias vector created by the layer
(only applicable if use_bias
is True
).
It creates kernel_mask
to set a fraction of the kernel
weights to zero.
Note: If the input to the layer has a rank greater than 2, then
it is flattened prior to the initial dot product with kernel
.
The output is guaranteed to be dense (each output is connected to at least one input), and no input is disconnected (each input is connected to at least one output).
At output = activation(dot(input, kernel) + bias)
1 the number of trainable weights is output = activation(dot(input, kernel) + bias)
2. At
output = activation(dot(input, kernel) + bias)
3 this layer is equivalent to output = activation(dot(input, kernel) + bias)
4.
Input shape:
N-D tensor with shape: output = activation(dot(input, kernel) + bias)
5.
The most common situation would be
a 2D input with shape output = activation(dot(input, kernel) + bias)
6.
Output shape:
N-D tensor with shape: output = activation(dot(input, kernel) + bias)
7.
For instance, for a 2D input with shape output = activation(dot(input, kernel) + bias)
6,
the output would have shape output = activation(dot(input, kernel) + bias)
9.
__init__
Declares instance variables with default values.
Arguments:
density
- Approximate fraction of trainable weights (between 0 and 1).units
- Positive integer, dimensionality of the output space.activation
- Activation function to use. If you don't specify anything, no activation is applied (ie. "linear" activation:a(x) = x
).use_bias
- Indicates whether the layer uses a bias vector.kernel_initializer
- Initializer for thekernel
weights matrix.bias_initializer
- Initializer for the bias vector.kernel_regularizer
- Regularizer function applied to thekernel
weights matrix.units
0 - Regularizer function applied to the bias vector.units
1 - Regularizer function applied to the output of the layer (its "activation")..units
2 - Constraint function applied to thekernel
weights matrix.units
4 - Constraint function applied to the bias vector.
build
Prepares the kernel mask.
Arguments:
input_shape
- Shape of the inputs to this layer
call
Processes the given inputs.
Arguments:
inputs
- What goes into this layer
Returns:
The processed inputs.
Ffnn Objects
Feed-forward network layer.
Arguments:
layer_sizes
- List of integers with dimensionality of the layers.dropout_rate
- Fraction of the input units to drop (between 0 and 1).reg_lambda
- regularization factor.density
- Approximate fraction of trainable weights (between 0 and 1).layer_name_suffix
- Text added to the name of the layers.Input shape: N-D tensor with shape:
(batch_size, ..., input_dim)
. The most common situation would be a 2D input with shape(batch_size, input_dim)
.Output shape: N-D tensor with shape:
(batch_size, ..., layer_sizes[-1])
. For instance, for a 2D input with shape(batch_size, input_dim)
, the output would have shape(batch_size, layer_sizes[-1])
.
call
Apply feed-forward network layer.
Embed Objects
Dense embedding layer.
Input shape:
N-D tensor with shape: (batch_size, ..., input_dim)
.
The most common situation would be
a 2D input with shape (batch_size, input_dim)
.
Output shape:
N-D tensor with shape: (batch_size, ..., embed_dim)
.
For instance, for a 2D input with shape (batch_size, input_dim)
,
the output would have shape (batch_size, embed_dim)
.
__init__
Initialize layer.
Arguments:
embed_dim
- Dimensionality of the output space.reg_lambda
- Regularization factor.layer_name_suffix
- Text added to the name of the layers.
call
Apply dense layer.
InputMask Objects
The layer that masks 15% of the input.
Input shape:
N-D tensor with shape: (batch_size, ..., input_dim)
.
The most common situation would be
a 2D input with shape (batch_size, input_dim)
.
Output shape:
N-D tensor with shape: (batch_size, ..., input_dim)
.
For instance, for a 2D input with shape (batch_size, input_dim)
,
the output would have shape (batch_size, input_dim)
.
call
Randomly mask input sequences.
Arguments:
x
- Input sequence tensor of rank 3.mask
- A tensor representing sequence mask, contains1
for inputs and0
for padding.training
- Indicates whether the layer should run in training mode (mask inputs) or in inference mode (doing nothing).
Returns:
A tuple of masked inputs and boolean mask.
CRF Objects
CRF layer.
Arguments:
num_tags
- Positive integer, number of tags.reg_lambda
- regularization factor.name
- Optional name of the layer.
call
Decodes the highest scoring sequence of tags.
Arguments:
logits
- A [batch_size, max_seq_len, num_tags] tensor of unary potentials.sequence_lengths
- A [batch_size] vector of true sequence lengths.
Returns:
A [batch_size, max_seq_len] matrix, with dtype tf.int32
.
Contains the highest scoring tag indices.
A [batch_size, max_seq_len] matrix, with dtype tf.float32
.
Contains the confidence values of the highest scoring tag indices.
loss
Computes the log-likelihood of tag sequences in a CRF.
Arguments:
logits
- A [batch_size, max_seq_len, num_tags] tensor of unary potentials to use as input to the CRF layer.tag_indices
- A [batch_size, max_seq_len] matrix of tag indices for which we compute the log-likelihood.sequence_lengths
- A [batch_size] vector of true sequence lengths.
Returns:
Negative mean log-likelihood of all examples, given the sequence of tag indices.
f1_score
Calculates f1 score for train predictions.
DotProductLoss Objects
Abstract dot-product loss layer class.
Idea based on StarSpace paper: http://arxiv.org/abs/1709.03856
Implements similarity methods
sim
(computes a similarity between vectors)get_similarities_and_confidences_from_embeddings
(callssim
and also computes confidence values)
Specific loss functions (single- or multi-label) must be implemented in child classes.
__init__
Declares instance variables with default values.
Arguments:
num_candidates
- Number of labels besides the positive one. Depending on whether single- or multi-label loss is implemented (done in sub-classes), these can be all negative example labels, or a mixture of negative and further positive labels, respectively.scale_loss
- Boolean, ifTrue
scale loss inverse proportionally to the confidence of the correct prediction.constrain_similarities
- Boolean, ifTrue
applies sigmoid on all similarity terms and adds to the loss function to ensure that similarity values are approximately bounded. Used inside _loss_cross_entropy() only.model_confidence
- Normalization of confidence values during inference. Currently, the only possible value isSOFTMAX
.similarity_type
- Similarity measure to use, eithercosine
orinner
.scale_loss
0 - Optional name of the layer.
Raises:
scale_loss
1 - Whensimilarity_type
is not one ofscale_loss
3 orscale_loss
4.
sim
Calculates similarity between a
and b
.
Operates on the last dimension. When a
and b
are vectors, then sim
computes either the dot-product, or the cosine of the angle between a
and b
,
depending on self.similarity_type
.
Specifically, when the similarity type is INNER
, then we compute the scalar
product a . b
. When the similarity type is b
0, we compute
b
1, i.e. the cosine of the angle between a
and b
.
Arguments:
a
- Any float tensorb
- Any tensor of the same shape and type asa
b
7 - Mask (should contain 1s for inputs and 0s for padding). Note, thatb
8 should hold.
Returns:
Similarities between vectors in a
and b
.
get_similarities_and_confidences_from_embeddings
Computes similary between input and label embeddings and model's confidence.
First compute the similarity from embeddings and then apply an activation function if needed to get the confidence.
Arguments:
input_embeddings
- Embeddings of input.label_embeddings
- Embeddings of labels.mask
- Mask (should contain 1s for inputs and 0s for padding). Note, thatlen(mask.shape) == len(a.shape) - 1
should hold.
Returns:
similarity between input and label embeddings and model's prediction confidence for each label.
call
Layer's logic - to be implemented in child class.
apply_mask_and_scaling
Scales the loss and applies the mask if necessary.
Arguments:
loss
- The loss tensormask
- (Optional) A mask to multiply with the loss
Returns:
The scaled loss, potentially averaged over the sequence dimension.
SingleLabelDotProductLoss Objects
Single-label dot-product loss layer.
This loss layer assumes that only one output (label) is correct for any given input.
__init__
Declares instance variables with default values.
Arguments:
num_candidates
- Positive integer, the number of incorrect labels; the algorithm will minimize their similarity to the input.loss_type
- The type of the loss function, eithercross_entropy
ormargin
.mu_pos
- Indicates how similar the algorithm should try to make embedding vectors for correct labels; should be 0.0 < ... < 1.0 forcosine
similarity type.mu_neg
- Maximum negative similarity for incorrect labels, should be -1.0 < ... < 1.0 forcosine
similarity type.use_max_sim_neg
- IfTrue
the algorithm only minimizes maximum similarity over incorrect intent labels, used only ifloss_type
is set tomargin
.loss_type
2 - The scale of how important it is to minimize the maximum similarity between embeddings of different labels, used only ifloss_type
is set tomargin
.loss_type
5 - IfTrue
scale loss inverse proportionally to the confidence of the correct prediction.loss_type
7 - Similarity measure to use, eithercosine
orloss_type
9.cross_entropy
0 - Optional name of the layer.cross_entropy
1 - IfTrue
sample same negative labels for the whole batch.cross_entropy
3 - IfTrue
and loss_type iscross_entropy
, a sigmoid loss term is added to the total loss to ensure that similarity values are approximately bounded.cross_entropy
6 - Normalization of confidence values during inference. Currently, the only possible value iscross_entropy
7.
call
Calculate loss and accuracy.
Arguments:
inputs_embed
- Embedding tensor for the batch inputs; shape(batch_size, ..., num_features)
labels_embed
- Embedding tensor for the batch labels; shape(batch_size, ..., num_features)
labels
- Tensor representing batch labels; shape(batch_size, ..., 1)
all_labels_embed
- Embedding tensor for the all labels; shape(num_labels, num_features)
all_labels
- Tensor representing all labels; shape(num_labels, 1)
(batch_size, ..., num_features)
0 - Optional mask, contains(batch_size, ..., num_features)
1 for inputs and(batch_size, ..., num_features)
2 for padding; shape(batch_size, ..., num_features)
3
Returns:
(batch_size, ..., num_features)
4 - Total loss.(batch_size, ..., num_features)
5 - Training accuracy.
MultiLabelDotProductLoss Objects
Multi-label dot-product loss layer.
This loss layer assumes that multiple outputs (labels) can be correct for any given input. To accomodate for this, we use a sigmoid cross-entropy loss here.
__init__
Declares instance variables with default values.
Arguments:
num_candidates
- Positive integer, the number of candidate labels.scale_loss
- IfTrue
scale loss inverse proportionally to the confidence of the correct prediction.similarity_type
- Similarity measure to use, eithercosine
orinner
.name
- Optional name of the layer.constrain_similarities
- Boolean, ifTrue
applies sigmoid on all similarity terms and adds to the loss function to ensure that similarity values are approximately bounded. Used inside _loss_cross_entropy() only.model_confidence
- Normalization of confidence values during inference. Currently, the only possible value isscale_loss
0.
call
Calculates loss and accuracy.
Arguments:
batch_inputs_embed
- Embeddings of the batch inputs (e.g. featurized trackers); shape(batch_size, 1, num_features)
batch_labels_embed
- Embeddings of the batch labels (e.g. featurized intents for IntentTED); shape(batch_size, max_num_labels_per_input, num_features)
batch_labels_ids
- Batch label indices (e.g. indices of the intents). We assume that indices are integers that run from0
to(number of labels) - 1
. shape(batch_size, max_num_labels_per_input, 1)
all_labels_embed
- Embeddings for all labels in the domain; shape(batch_size, num_features)
(batch_size, 1, num_features)
0 - Indices for all labels in the domain; shape(batch_size, 1, num_features)
1(batch_size, 1, num_features)
2 - Optional sequence mask, which contains(batch_size, 1, num_features)
3 for inputs and0
for padding.
Returns:
(batch_size, 1, num_features)
5 - Total loss (based on StarSpace http://arxiv.org/abs/1709.03856); scalar(batch_size, 1, num_features)
6 - Training accuracy; scalar