Version: Latest

Speech Integrations

Automatic Speech Recognition (ASR)

This section describes the supported integrations with Automatic Speech Recognition (ASR) or Speech To Text (STT) services. Rasa uses mulaw encoding with an 8000 Hz Sample rate, these parameters are not configurable.

Deepgram

Use the environment variable DEEPGRAM_API_KEY for Deepgram API Key. You can request a key from Deepgram. It can be configured in a Voice Stream channel as follows:

credentials.yml
browser_audio:
# ... other configuration
asr:
name: deepgram

Configuration parameters

  • endpoint: Optional. The endpoint URL for the Deepgram API.
  • endpointing: Optional. Number of milliseconds of silence to determine the end of speech.
  • language: Optional. The language code for the speech recognition.
  • model: Optional. The model to be used for speech recognition.
  • smart_format: Optional. Boolean value to enable or disable Deepgram's smart formatting.

Azure

Requires the python library azure-cognitiveservices-speech. The API Key can be set with the environment variable AZURE_SPEECH_API_KEY. Sample configuration looks as follow:

credentials.yml
browser_audio:
# ... other configuration
asr:
name: azure

Configuration parameters

  • language: Optional. The language code for the speech recognition.
  • speech_region: Optional. The region identifier for the Azure Speech service, such as westus. Ensure that the region matches the region of your subscription.

Text To Speech (TTS)

This section describes the supported integrations with Text To Speech (TTS) services.

Azure TTS

The API Key can be set with the environment variable AZURE_SPEECH_API_KEY. Sample configuration looks as follow:

credentials.yml
browser_audio:
# ... other configuration
tts:
name: azure

Configuration parameters

  • language: Optional. The language code for the text-to-speech conversion.
  • voice: Optional. The voice to be used for the text-to-speech conversion.
  • timeout: Optional. The timeout duration in seconds for the text-to-speech request.
  • speech_region: Optional. The region identifier for the Azure Speech service, such as westus. Ensure that the region matches the region of your subscription.

Cartesia TTS

Use the environment variable CARTESIA_API_KEY for Cartesia API Key. The API Key requires a Cartesia account. It can be configured in a Voice Stream channel as follows,

credentials.yml
browser_audio:
# ... other configuration
tts:
name: cartesia

Configuration parameters

  • language: Optional. The language code for the text-to-speech conversion.
  • voice: Optional. The voice to be used for the text-to-speech conversion.
  • timeout: Optional. The timeout duration in seconds for the text-to-speech request.
  • model_id: Optional. The model ID to be used for the text-to-speech conversion.
  • version: Optional. The version of the model to be used for the text-to-speech conversion.