Skip to main content

Voice Assistants

Developer Edition

If you started building your assistant with the Rasa Pro Developer Edition before Rasa Pro 3.11 and want to try voice features, please request a new license. Licenses issued before this version don't contain the necessary feature scopes to run voice assistants.

Building Voice Assistants

Voice assistants provide a natural and intuitive way to interact with digital devices and services. They are particularly useful for hands-free operation, accessibility, and multitasking. They also offer a familiar and frictionless experience to the customers of contact centers. At the same time, voice solutions present distinct technical challenges and require elaborate user experience design.

Rasa provides voice channel connectors that require specialized handling to address nuanced complexities in voice conversations. The connectors are described in detail below.

Voice Ready

Voice Ready Channel Connectors in Rasa process input and output as text while enabling communication through audio. Rasa relies on external services for Speech Recognition (STT) and Text-to-Speech (TTS) to facilitate this.

For example, the Twilio Voice built-in channel in Rasa is a Voice Ready Channel Connector.

Voice Stream

Voice Stream Channel Connectors in Rasa process both input and output in audio. They transcribe incoming audio into text, process it within Rasa, and then convert the response back into audio. The assistant is communicating with the user through Audio, just as well.

For example, the Twilio Media Streams channel connector in Rasa is a Voice Stream Channel Connector.

How to Start Building a Voice Assistant

To build an optimized voice assistant, it is recommended to develop it separately from text-based assistants. Although a text assistant can serve as a foundation, maintaining and evolving the assistant is easier when voice and text assistants are developed separately.

Following CDD best practices, start your voice project with rigorous user research and include iterative user tests in the development process. Make sure to design your voice flows with the unique requirements of the modality in mind.

Apart from connecting and configuring your channel connector, you will need to configure the speech services. More information on those here:

You can also Test your voice assistant directly in your browser, allowing for an iterative building process.

Voice-Specific Primitives and Conversation Repair

Voice assistants rely on the same core building blocks as text-based assistants (like responses, actions, and flows), but they require additional configuration and design adjustments to handle the nuances of spoken interactions.

These include:

  • Fine-tuning how conversations are initiated and ended
  • Managing voice-specific metadata
  • Handling silence or no-input cases
  • Repeating or rephrasing messages when users don’t respond

These tweaks ensure voice conversations feel natural and responsive, even when user behavior is unpredictable.

👉 Explore voice conversation patterns

Handling User Silence

In voice conversations, silence can signal confusion, hesitation, or distraction. With the silence timeout setting, you can control how long the assistant waits before responding — and tweak what it does when that happens.

👉 How to configure user silence parameters

Using Channel-Specific Responses

Tailor your responses for voice channels like phone calls using channel-specific response variations.

👉 How to configure channel-specific responses

Using Filler Responses for Slow Operations

When certain operations may take time (such as certain custom actions), include "filler" responses to keep users informed about the ongoing process. These responses confirm that the system is processing the request, reducing user uncertainty and abandonment. This technique is especially important for voice-based channels like phone calls, where users don't have visual UI indicators of progress. This is an example of a filler response:

flows.yml
flows:
check_balance:
name: check your balance
description: check the user's account balance
steps:
- action: utter_please_wait # a response that tells user to wait a moment
- action: check_balance # let's say if this is a slow custom action
- action: utter_current_balance