Version: Latest

Twilio Media Streams

Use this channel to connect your Rasa assistant to Twilio Media Streams for voice capabilities. Unlike the standard Twilio Voice connector, this channel handles speech-to-text (ASR) and text-to-speech (TTS) processing directly in Rasa.

Basic Rasa Configuration

Create or edit your credentials.yml and add the following channel configuration:

credentials.yml
twilio_media_streams:
monitor_silence: false
# ASR Configuration
asr:
name: "azure" # or "deepgram"
# Add ASR-specific configuration here
# TTS Configuration
tts:
name: "azure" # or "cartesia"
# Add TTS-specific configuration here

You can enable user silence monitoring by setting the boolean parameter monitor_silence to true. Read more about this here.

You can run the assistant using the command rasa run. You'll need a URL accessible by Twilio for your Rasa assistant. For development, you can use ngrok.

Bot URLs for development

Visit this section to learn how to generate the required bot URL when testing the channel on your local machine.

Configuring Twilio Webhook

Bot URLs for development

Visit this section to learn how to generate the required bot URL when testing the channel on your local machine.

Go to the Phone Numbers section of your Twilio account and select the phone number you want to connect to Rasa. Select the option "Webhook, TwiML Bin, Function, Studio Flow, Proxy Service" and set the URL of your Rasa Server as webhook. Depending on the hostname, the webhook URL would be

https://example.com/webhooks/twilio_media_streams/webhook

Your webhook endpoint must be served over HTTPS. Twilio Media Streams does not accept insecure HTTP URLs.

Sample Twilio Webhook configuration

Usage

Receiving Audio

When a user speaks, Twilio streams the audio to Rasa where:

  1. The audio stream is collected and buffered
  2. The configured ASR service converts speech to text
  3. The text is processed by Rasa's NLU pipeline

Sending Responses

For bot responses:

  1. Rasa generates text responses
  2. The configured TTS service converts text to audio
  3. Audio is streamed back to Twilio

Call Events

Like other voice channels, the following events are supported:

EventIntentDescription
startsession_startTriggered when call connects
endsession_endTriggered when call disconnects
DTMF-Phone keypad presses (sent as text messages)