notice

This is documentation for Rasa Documentation v2.x, which is no longer actively maintained.
For up-to-date documentation, see the latest version (3.x).

Version: 2.x

Twilio Voice

You can use the Twilio Voice connector to answer phone calls made to your Twilio phone number or Twilio SIP domain.

New in 2.6

The Twilio Voice connector was added.

Running on Twilio

Connect to a Twilio Phone Number

To forward calls from Twilio to your Rasa assistant the webhook for your phone number needs to be updated. Go to the Phone Numbers section of your Twilio account and select the phone number you want to connect to Rasa. Find the Voice & Fax section on the screen. Under the A CALL COMES IN section add the webhook URL (e.g. https://<host>:<port>/webhooks/twilio_voice/webhook) for your Rasa bot replacing the host and port with the appropriate values for your deployment. Click Save at the bottom of the page.

Set Twilio Webhook

Connect to a Twilio SIP Domain

You can also connect your Twilio Voice channel to a Twilio SIP domain. You can follow the Twilio Sending SIP guide to forward SIP requests to your Twilio SIP domain. Once your SIP domain is configured you will have to forward incoming calls to Rasa. In the Twilio console go to the SIP Domains section. Select the SIP domain you would like to use. Find the Call Control Configuration section and add the webhook URL (e.g. https://<host>:<port>/webhooks/twilio_voice/webhook) for your Rasa bot replacing the host and port with the appropriate values for your deployment. Click Save at the bottom of the page.

Set Twilio SIP Domain Webhook

Configure Channel in Rasa

In your credentials.yml file make sure the twilio_voice channel is added. Within credentials.yml there are a number of parameters you can set to control the behavior of your assistant. An example with definitions of each parameter is below. Unlike the Twilio text channel there is no need to add your Twilio credentials for the voice channel. Note, changing values for enhanced and assistant_voice can result in added costs from Twilio. Review the documentation below for details about these settings.

credentials.yml
twilio_voice:
initial_prompt: "hello"
assistant_voice: "woman"
reprompt_fallback_phrase: "I didn't get that could you repeat?"
speech_timeout: "5"
speech_model: "default"
enhanced: "false"

Parameter Definitions

Initial Prompt

When Twilio receives a new call and forwards this to Rasa Open Source, Twilio does not provide a user message. In this case Rasa Open Source will act as if the user sent the message "Hello". This behavior can be configured in your credentials.yml file by setting the initial_prompt parameter to the desired input. The initial_prompt value will be sent to Rasa Open Source and the response will be spoken to the user to start the voice conversation. How you greet a user via a voice channel may differ from a text channel. You should review your responses and consider creating channel-specific variations where necessary.

Assistant Voice

You can add personality to your assistant by specifying the type of voice your assistant should speak with. In the credentials.yml file you can add an option for assistant_voice to specify the type of voice of your assistant. For a list of supported voices you can check the Twilio documentation. By default Twilio's woman voice will be used. Note that you will incur additional charges for using any of the Polly voices. The Twilio documentation has details about Polly pricing.

Reprompt Fallback Phrase

Unlike text channels where users can review previous messages in the conversation and take their time to reply, with voice channels users can get confused when there is a pause in the conversation. When a long pause is detected during a conversation Rasa will repeat the last utterance from the bot to re-prompt the user for a response. If the previous bot utterance cannot be identified then the message defined in reprompt_fallback_phrase will be sent. By default this is set to "I'm sorry I didn't get that could you rephrase".

Speech Timeout

How long Twilio will collect speech from the caller. This parameter must be set to a number or "auto". If a number is provided the assistant will collect speech for the specified amount of time. If speech_timeout is set to "auto" then Twilio will collect speech until a pause is detected. The default setting of this parameter is "auto". You can find more details about speech_timeout in the Twilio documentation. Note that speech_timeout="auto" is only compatible with speech_model='numbers_and_commands'. An error will be raised if an incompatible speech_model is used with the speech_timeout.

Speech Model

Adjusting the speech_model parameter can help Twilio with the accuracy of its speech-to-text transformation. Valid values are default, numbers_and_commands, and phone_call. The default setting is "default". You can find more details about speech_model in the Twilio documentation.

Enhanced

Setting enhanced to true will allow you to use Twilio's premium speech model to improve the accuracy of transcription results. Note, setting this parameter to true will result in higher Twilio transcription costs. This parameter is only supported if you also have set the speech_model parameter to phone_call. By default enhanced is set to "false". You can find more about the enhanced speech model option in the Twilio documentation.

Custom Voice Responses

It is highly recommended that you provide voice channel specific responses for all responses containing images, emojis and/or abbreviations. Visual media like images and emojis do not translate well to voice applications. Having voice alternatives to those responses lets you tailor the user experience to the voice channel. If a response is detected as containing either an image or emoji a warning will be raised. The image or emoji will be removed from the response and only any accompanying text will be included in the response back to the user.

In addition to reviewing responses containing images and emojis you should also review all other responses for their applicability to voice channels. Short-hand abbreviations like "e.g." do not translate well. Voice responses should use the long-hand versions of any abbreviations. Interactions with users can differ between text and voice channels. Reviewing all responses in the scope of a voice channel and providing voice specific responses can make for a better user experience.