Version: Latest

Voice Assistants

Developer Edition

If you started building your assistant with the Rasa Pro Developer Edition before Rasa Pro 3.11 and want to try voice features, please request a new license. Licenses issued before this version don't contain the necessary feature scopes to run voice assistants.

Building Voice Assistants

Voice assistants offer a more natural and intuitive way to communicate with digital devices and services that is particularly efficient for hands-free operation, increased accessibility and multitasking. They also offer a familiar and frictionless experience to the customers of contact centers. At the same time, voice solutions present distinct technical challenges and require elaborate user experience design.

Rasa provides voice channel connectors that require specialized handling to address nuanced complexities in voice conversations. The connectors are described in detail below.

Voice Ready

Architecture of Voice Ready Channel

Voice Ready Channel Connectors in Rasa receive the input and respond in Text. However, the assistant is communicating with the user through Audio as a modality. Rasa uses external services to handle Speech Recognition (STT) and Text-to-Speech (TTS).

For example, the Twilio Voice built-in channel in Rasa is a Voice Ready Channel Connector.

Voice Stream

Architecture of Voice Stream Channel

Voice Stream Channel Connectors in Rasa receive the input and respond in Audio. They convert audio stream to text, process it within Rasa, and convert the text back into audio. The assistant is communicating with the user through Audio, just as well.

For example, the Twilio Media Streams channel connector in Rasa is a Voice Stream Channel Connector.

How do Start building a Voice Assistant

Since voice solutions incur more costs for operation and maintenance, you might want to start your journey into Rasa and CALM by building text assistants and transitioning to Voice later on. You can connect an existing Rasa Assistant to a Voice Ready or a Voice Stream channel connector.

Following CDD best practices, start your voice project with rigorous user research and include iterative user tests in the development process. Design the flows for voice as a modality.

Use Channel Specific Response Variations to ensure that the Voice Assistant's responses are appropriate for a phone call.

Head over to the following pages for more information: