Speech Integrations
Automatic Speech Recognition (ASR)
This section describes the supported integrations with Automatic Speech Recognition (ASR) or Speech To Text (STT) services.
Rasa uses mulaw
encoding with an 8000 Hz Sample rate, these parameters are not configurable.
Deepgram
Use the environment variable DEEPGRAM_API_KEY
for Deepgram API Key. You can
request a key from Deepgram. It can be configured in a Voice Stream channel
as follows:
Configuration parameters
endpoint
: Optional. The endpoint URL for the Deepgram API.endpointing
: Optional. Number of milliseconds of silence to determine the end of speech.language
: Optional. The language code for the speech recognition.model
: Optional. The model to be used for speech recognition.smart_format
: Optional. Boolean value to enable or disable Deepgram's smart formatting.
Azure
Requires the python library azure-cognitiveservices-speech
. The API Key can be set with the environment variable AZURE_SPEECH_API_KEY
.
Sample configuration looks as follow:
Configuration parameters
language
: Optional. The language code for the speech recognition.speech_region
: Optional. The region identifier for the Azure Speech service, such aswestus
. Ensure that the region matches the region of your subscription.
Text To Speech (TTS)
This section describes the supported integrations with Text To Speech (TTS) services.
Azure TTS
The API Key can be set with the environment variable AZURE_SPEECH_API_KEY
. Sample configuration looks as follow:
Configuration parameters
language
: Optional. The language code for the text-to-speech conversion.voice
: Optional. The voice to be used for the text-to-speech conversion.timeout
: Optional. The timeout duration in seconds for the text-to-speech request.speech_region
: Optional. The region identifier for the Azure Speech service, such aswestus
. Ensure that the region matches the region of your subscription.
Cartesia TTS
Use the environment variable CARTESIA_API_KEY
for Cartesia API Key. The API Key
requires a Cartesia account. It can be configured in a Voice Stream channel as follows,
Configuration parameters
language
: Optional. The language code for the text-to-speech conversion.voice
: Optional. The voice to be used for the text-to-speech conversion.timeout
: Optional. The timeout duration in seconds for the text-to-speech request.model_id
: Optional. The model ID to be used for the text-to-speech conversion.version
: Optional. The version of the model to be used for the text-to-speech conversion.