Speech Integrations

Automatic Speech Recognition (ASR)

This section describes the supported integrations with Automatic Speech Recognition (ASR) or Speech To Text (STT) services. Rasa uses mulaw encoding with an 8000 Hz Sample rate, these parameters are not configurable.

Deepgram

Use the environment variable DEEPGRAM_API_KEY for Deepgram API Key. You can request a key from Deepgram. It can be configured in a Voice Stream channel as follows:

credentials.yml
browser_audio:
  # ... other configuration
  asr:
    name: deepgram

Configuration parameters

endpoint: Optional. The endpoint URL for the Deepgram API.
endpointing: Optional. Number of milliseconds of silence to determine the end of speech.
language: Optional. The language code for the speech recognition.
model: Optional. The model to be used for speech recognition.
smart_format: Optional. Boolean value to enable or disable Deepgram's smart formatting.

Azure

Requires the python library azure-cognitiveservices-speech. The API Key can be set with the environment variable AZURE_SPEECH_API_KEY. Sample configuration looks as follow:

credentials.yml
browser_audio:
  # ... other configuration
  asr:
    name: azure

Configuration parameters

language: Optional. The language code for the speech recognition.
speech_region: Optional. The region identifier for the Azure Speech service, such as westus. Ensure that the region matches the region of your subscription.

Text To Speech (TTS)

This section describes the supported integrations with Text To Speech (TTS) services.

Azure TTS

The API Key can be set with the environment variable AZURE_SPEECH_API_KEY. Sample configuration looks as follow:

credentials.yml
browser_audio:
  # ... other configuration
  tts:
    name: azure

Configuration parameters

language: Optional. The language code for the text-to-speech conversion.
voice: Optional. The voice to be used for the text-to-speech conversion.
timeout: Optional. The timeout duration in seconds for the text-to-speech request.
speech_region: Optional. The region identifier for the Azure Speech service, such as westus. Ensure that the region matches the region of your subscription.

Cartesia TTS

Use the environment variable CARTESIA_API_KEY for Cartesia API Key. The API Key requires a Cartesia account. It can be configured in a Voice Stream channel as follows,

credentials.yml
browser_audio:
  # ... other configuration
  tts:
    name: cartesia

Configuration parameters

language: Optional. The language code for the text-to-speech conversion.
voice: Optional. The id of the voice to use for text-to-speech conversion. The parameter will be passed to the Cartesia API as "voice": {"mode": "id","id": "VALUE"}
timeout: Optional. The timeout duration in seconds for the text-to-speech request.
model_id: Optional. The model ID to be used for the text-to-speech conversion.
version: Optional. The version of the model to be used for the text-to-speech conversion.

Automatic Speech Recognition (ASR)​

Deepgram​

Configuration parameters​

Azure​

Configuration parameters​

Text To Speech (TTS)​

Azure TTS​

Configuration parameters​

Cartesia TTS​

Configuration parameters​

Automatic Speech Recognition (ASR)

Deepgram

Configuration parameters

Azure

Configuration parameters

Text To Speech (TTS)

Azure TTS

Configuration parameters

Cartesia TTS

Configuration parameters