notice
This is documentation for Rasa Open Source Documentation v2.2.x, which is no longer actively maintained.
For up-to-date documentation, see the latest version (2.3.x).
rasa.utils.tensorflow.transformer
MultiHeadAttention Objects
Multi-headed attention layer.
Arguments:
units
- Positive integer, output dim of hidden layer.num_heads
- Positive integer, number of heads to repeat the same attention structure.attention_dropout_rate
- Float, dropout rate inside attention for training.sparsity
- Float between 0 and 1. Fraction of thekernel
weights to set to zero.unidirectional
- Boolean, use a unidirectional or bidirectional encoder.use_key_relative_position
- Boolean, if 'True' use key relative embeddings in attention.use_value_relative_position
- Boolean, if 'True' use value relative embeddings in attention.max_relative_position
- Positive integer, max position for relative embeddings.heads_share_relative_embedding
- Boolean, if 'True' heads will share relative embeddings.
call
Apply attention mechanism to query_input and source_input.
Arguments:
query_input
- A tensor with shape [batch_size, length, input_size].source_input
- A tensor with shape [batch_size, length, input_size].pad_mask
- Float tensor with shape broadcastable to (..., length, length). Defaults to None.training
- A bool, whether in training mode or not.
Returns:
Attention layer output with shape [batch_size, length, units]
TransformerEncoderLayer Objects
Transformer encoder layer.
The layer is composed of the sublayers:
- Self-attention layer
- Feed-forward network (which is 2 fully-connected layers)
Arguments:
units
- Positive integer, output dim of hidden layer.num_heads
- Positive integer, number of heads to repeat the same attention structure.filter_units
- Positive integer, output dim of the first ffn hidden layer.dropout_rate
- Float between 0 and 1; fraction of the input units to drop.attention_dropout_rate
- Float, dropout rate inside attention for training.sparsity
- Float between 0 and 1. Fraction of thekernel
weights to set to zero.unidirectional
- Boolean, use a unidirectional or bidirectional encoder.use_key_relative_position
- Boolean, if 'True' use key relative embeddings in attention.use_value_relative_position
- Boolean, if 'True' use value relative embeddings in attention.max_relative_position
- Positive integer, max position for relative embeddings.heads_share_relative_embedding
- Boolean, if 'True' heads will share relative embeddings.
call
Apply transformer encoder layer.
Arguments:
x
- A tensor with shape [batch_size, length, units].pad_mask
- Float tensor with shape broadcastable to (..., length, length). Defaults to None.training
- A bool, whether in training mode or not.
Returns:
Transformer encoder layer output with shape [batch_size, length, units]
TransformerEncoder Objects
Transformer encoder.
Encoder stack is made up of num_layers
identical encoder layers.
Arguments:
num_layers
- Positive integer, number of encoder layers.units
- Positive integer, output dim of hidden layer.num_heads
- Positive integer, number of heads to repeat the same attention structure.filter_units
- Positive integer, output dim of the first ffn hidden layer.reg_lambda
- Float, regularization factor.dropout_rate
- Float between 0 and 1; fraction of the input units to drop.attention_dropout_rate
- Float, dropout rate inside attention for training.sparsity
- Float between 0 and 1. Fraction of thekernel
weights to set to zero.unidirectional
- Boolean, use a unidirectional or bidirectional encoder.use_key_relative_position
- Boolean, if 'True' use key relative embeddings in attention.use_value_relative_position
- Boolean, if 'True' use value relative embeddings in attention.max_relative_position
- Positive integer, max position for relative embeddings.heads_share_relative_embedding
- Boolean, if 'True' heads will share relative embeddings.name
- Optional name of the layer.
call
Apply transformer encoder.
Arguments:
x
- A tensor with shape [batch_size, length, input_size].pad_mask
- Float tensor with shape broadcastable to (..., length, length). Defaults to None.training
- A bool, whether in training mode or not.
Returns:
Transformer encoder output with shape [batch_size, length, units]