Connect Your Rasa AI Assistant to Amazon Alexa

Please note: the method here has been deprecated by the Alexa team and no longer works. This blog post is for reference only.

In this tutorial, you'll train a Rasa assistant and connect it to Alexa, Amazon's cloud-based voice service. This will let users interact with your assistant either by voice or text through Alexa-integrated devices. Here's what interacting with your finished assistant will look like:

You'll be using Alexa to transcribe the users speech also to take the bot's response and generate speech from it. Basically: your Alexa skill will handle the voice side of things while Rasa will take care of figuring out what the user meant and what to say next. While you could also choose to use Alexa's built-in intent classification and dialog management services, handling these functions using Rasa has several advantages:

Because you can connect a single Rasa assistant to many different services this allows you to serve users with a single consistent experience across devices, whether they're running Alexa, Google Assistant or open source options like Aimybox or Mycroft.
Using only one assistant across services means you only need to maintain and update a single assistant, which is a big time saver.
Because Rasa is an open source tool you can extend and modify the functionality of the underlying assistant to a greater degree than is possible with a proprietary tool.

Let's get building! Here's what you'll find in this tutorial:

Overview
Set up checklist
Getting your Rasa assistant running locally
Creating your Alexa skill
Connecting your Rasa Assistant and Alexa Skill

Overview

In this section you'll learn how the Alexa and Rasa integration works. If you're already familiar with custom connectors or just want to get started, you can skip to the next section to set up your project.

In order to integrate Alexa and Rasa, you'll need:

A Rasa assistant
An Alexa skill
A custom connector that defines URL endpoints and passes information back and forth between your Alexa skill and Rasa assistant

For this example, you'll be using a simple assistant that asks you how you're feeling and, if you're sad, tells you a fun fact about sea otters. (It's similar to Moodbot, which is the example bot you'll build if you run rasa init. )

Once it's implemented, your skill will work like this: a user will ask Alexa to launch the skill using its invocation name. This will trigger Alexa to send a request to your connector to launch your skill and reply to the user with your defined launch message. The user's reply to the launch message will then be sent to your Rasa assistant via the connector, your assistant will determine the best response and that response will be passed back through the connector to Alexa. This process repeats until you send Alexa a response that indicates the skill should terminate.

Why does your connector need to be a separate module? The connector does two things. First, it defines the Alexa-specific endpoints for your Sanic server. Second, it converts the requests and responses between the Alexa request and response JSON format to the text format Rasa is expecting. Because both of these things are specific to each channel you connect your assistant to, you'll need a different connector for each channel.

Here's a visual overview of the whole process:

Sara, the Rasa Mascot, says "I feel happy". That is passed through the Alexa Skill, which handles speech recognition. From there the information is passed into rasa through the Alexa Connector, then Rasa NLU and COre, where the response is generated. From there the response passes through the Alexa Connector Module again, then to the Alexa Skill, which reads the response "Great, carry on!" to the user.

Let's drill down a little bit into how to make this work. In your Rasa Assistant you'll need to add two things:

First, in your credentials.yaml file, which should be in the same directory as your config.yml and domain.yaml, add the line [name of .py file with your connector].[Class name of connector]:. So if your connector's class name was AlexaConnector and it was in the python file alexa_connector.py, like it is in your example code, you'd need to add the line alexa_connector.AlexaConnector:

Second, you'll need to add your custom connector in a new python file in the same directory as your credentials.yaml file. It has a couple important parts:

A logger so you can track what's happening for debugging
A connector that includes a Sanic blueprint (Sanic is a Python web framework) that defines your endpoints and how to handle requests from Alexa. The two endpoints are health on / and receive on the HTTP route /webhook.

Let's take a look at the code for the connector. You can find the complete module here.

First, you'll import all the libraries you need and start logging to make it easier to debug.

import logging
import json
from sanic import Blueprint, response
from sanic.request import Request
from typing import Text, Optional, List, Dict, Any

from rasa.core.channels.channel import UserMessage, OutputChannel
from rasa.core.channels.channel import InputChannel
from rasa.core.channels.channel import CollectingOutputChannel

logger = logging.getLogger(__name__)

Next, you need to set up your connector class, including giving it a name:

class AlexaConnector(InputChannel):
    """A custom http input channel for Alexa.
    You can find more information on custom connectors in the 
    Rasa docs: https://rasa.com/docs/rasa/user-guide/connectors/custom-connectors/
    """

    @classmethod
    def name(cls):
        return "alexa_assistant"

Now, you're ready to define our Sanic blueprint. The first endpoint, on /, just lets you check the health of your server.


# Sanic blueprint for handling input. The on_new_message
# function pass the received message to Rasa Core
# after you have parsed it
def blueprint(self, on_new_message):

    alexa_webhook = Blueprint("alexa_webhook", __name__)

    # required route: use to check if connector is live
    @alexa_webhook.route("/", methods=["GET"])
    async def health(request):
        return response.json({"status": "ok"})

The second endpoint is where your server will accept requests from Alexa.

# required route: defines
@alexa_webhook.route("/webhook", methods=["POST"])
async def receive(request):
    # get the json request sent by Alexa
    payload = request.json
    # check to see if the user is trying 
    # to launch the skill
    intenttype = payload["request"]["type"]

    # if the user is starting the skill, let them 
    # know it worked & what to do next
    if intenttype == "LaunchRequest":
        message = "Hello! Welcome to this Rasa-powered Alexa skill. You can start by saying 'hi'."
        session = "false"
    else:
        # get the Alexa-detected intent
        intent = payload["request"]["intent"]["name"]

        # makes sure the user isn't trying to 
        # end the skill
        if intent == "AMAZON.StopIntent":
            session = "true"
            message = "Talk to you later"
        else:
            # get the user-provided text from
            # the slot named "text"
            text = payload["request"]["intent"]["slots"]["text"]["value"]

            # initialize output channel
            out = CollectingOutputChannel()

            # send the user message to Rasa & 
            # wait for the response
            await on_new_message(UserMessage(text, out))
            # extract the text from Rasa's response
            responses = [m["text"] for m in out.messages]
            message = responses[0]
            session = "false"

Next, you need to put your reply to the user and information about whether Alexa should end the session in the json format that Alexa is expecting. For more information on that format, refer to the Alexa Skills Kit Request and Response JSON Reference.

# Send the response generated by Rasa back to Alexa to
# pass on to the user. 

r = {
    "version": "1.0",
    "sessionAttributes": {"status": "test"},
    "response": {
        "outputSpeech": {
            "type": "PlainText",
            "text": message,
            "playBehavior": "REPLACE_ENQUEUED",
        },
        "reprompt": {
            "outputSpeech": {
                "type": "PlainText",
                "text": message,
                "playBehavior": "REPLACE_ENQUEUED",
            }
        },
        "shouldEndSession": session,
    },
}

Next, return your completed json response to the endpoint.

return response.json(r)

And, finally, return your class object. (I've deleted some } for readability.)

return alexa_webhook

And that's all you need to do on the Rasa end of things. On the Alexa side, the tricky part is being able to get all the user input.

Amazon does not, as of this writing, provide any built-in intents that will return the complete input entered by the user. If you liked, you could use Alexa's built-in intents and use Rasa only for the core dialog policies that help your assistant decide what to say next. One big downside of this is that you will need to maintain a separate model just for Alexa rather than serving the same model across all your channels.

So how do you get the raw user input? In this tutorial, you're going to create a new custom slot (called "ReturnAllText" in the example code) and a new custom intent (called "ReturnUserInput" in the example code). The only training data you'll provide for your intent will be for the slot, which means that all the input for turns identified as having the target intent will be returned as your slot. The specifications for this skill and slot are in the alexa_schema.json file.

In addition, your assistant will also have two default built-in intents. If these are deleted they will be automatically re-added when you build your model.

AMAZON.StopIntent stops the skill from running
AMAZON.NavigateHomeIntent lets users on devices with screens return to the home screens

So how can we ensure that Alexa will correctly classify all input that's not from utterances identified as StopIntent or NavigateHomeIntent as being from the slot "ReturnAllText"?

As training data for the "ReturnAllText" slot, I've used the first sentence of the Universal Declaration of Human Rights in thirteen different languages. Because these inputs are very different from each other, it should make the intent classifier very uncertain about what sort of input it can expect to see. As a result, it should be very likely that any user input will be assigned to this slot. From there, we just need to pass that information on to our Alexa connector.

Set up checklist

In order to follow this tutorial you'll need to set up a few things first.

You'll need to sign up for a free Alexa Developer Account. This is separate from your AWS account, if you have one.
You'll need to have Python 3 installed.
You'll be using ngrok to create a public URL for a locally-hosted server. You can find instructions to download and install it here.
If you'd like to use Git to clone the sample repository you should install it now. You can find instructions here. If you prefer, you can also download the repository from GitHub at this link.
Optional but recommended: You're going to work inside a virtual environment to keep your project modular. This will make it much easier to do things like hosting your project in a container later on. Set up and activate your virtual environment by following the instructions here.

Once you've gotten everything installed you're ready to get started!

Getting your assistant running locally

In order to get your skill launched on Alexa, you'll need to start by cloning the repository:

git clone https://github.com/RasaHQ/tutorial-rasa-alexa.git

From here, install the required packages (ideally in an activated virtual environment) by running:

cd tutorial-rasa-alexa/mood_bot_text
pip install -r requirements.txt

You should now be able to train your assistant using:

rasa train

Once your assistant is trained, you can try interacting with it in the shell by running:

rasa shell

Your assistant is now running locally! The next step is to create an Alexa skill that we can connect it to.

____Creat__ing your Alexa Skil__l

In order to serve your skill on Alexa, you need to let the service know what to expect. First, you'll need to create a new skill.

Create an account on the Alexa developer console (this is seperate from an existing AWS or Amazon account you may have).
Once you've logged in, open your developer console and click on the blue "Create Skill" button.
Pick a name for your skill (for this example I'm using "mood bot text") and set the default language.
On the next screen, pick "Custom" for "Choose a model to add to your skill", and "Provision your own" for "Choose a method to host your skill's backend resources".
On the next screen for "Choose a template", pick "Start from scratch".

Now you have a basic bot with five default intents. However, because your Rasa assistant will handle intent classification you want just the raw text the user inputs.

Click on "JSON Editor" on the left hand side underneath the intents and replace the existing JSON file with the contents of the alexa_schema.json file
from the repository you cloned earlier. This will also set your invocation name to "text mood bot", which you can update later if you like.
Save and build your model using the buttons at the top of the screen.
[Optional] Click on the "ReturnUserInput" intent and use the "Evaluate Model" button in the upper right hand corner to test some sample utterances. You should see that the intent is labelled "ReturnUserInput" and that there's a single slot "text" that contains all the text the user input.

And that's all you need to do to create your Alexa skill!

__Connect__ing your Rasa Assistant to your Alexa Skill

So at this point you have:

Your Alexa skill, which will send you the text that the user inputs
Your Rasa assistant, which will take in text, determine the content and select and return the next utterance

But at the moment they don't have any way of talking to each other. In order to connect them, you're going to need a server running your assistant. Then you'll need to let Alexa know the address to send the user input to. For this example, you'll be using ngrok to create a temporary public URL for the assistant you're running locally.

NOTE: If you're planning on doing anything more substantial or permanent than a quick demo or test, you'll need to use a more permanent hosting solution.

Start by running a rasa server locally with this command:

rasa run

By default this will serve your assistant on http://localhost:5005. Keep this terminal window open. Now you'll create a public URL. In a new terminal window, run the command that corresponds to your operating system.

On Windows (in the same directory as ngrok.exe):

ngrok.exe http 5005

Mac/linux:

ngrok http 5005

This will create a unique URL that can be used to access your assistant. You can check that it's working by going to the web interface URL provided by ngrok: http://127.0.0.1:4040/inspect/http (Don't worry if you get an error at this link right now: it will only work on the same computer you're running your ngrok server on and only while the server is up.)

Now that you have an endpoint for your model, you just need to point your skill to it.

Go back to the Alexa developer console and clicking on the "Build" tab and then on "Endpoints" in the menu on the left hand side.
Copy the HTTPS URL from ngrok (it'll look something like "https://159acb6e.ngrok.io") and paste it in the text field for the endpoint, adding "webhooks/alexa_assistant/webhook". ("alexa_assistant" is the name of the connecter, defined in alexa_connector.py and "webhook" is the name of the endpoint you defined in your Sanic blueprint ).
Select the appropriate option for the dropdown for your security certificate.
Make sure to save your endpoints every time you update them!

Your Rasa Assistant should now be connected to your Alexa Skill! 🎉 You can test your assistant in the developer console to make sure it's working correctly.

Go the the "Test" tab for your skill on the Alexa Devleloper Console https://developer.amazon.com/alexa/console/ask/test
Find the drop down menu at the top of the screen and next to "Skill testing is enabled in:" pick "Development"
Use the interface to test your skill. You'll need to use the Invocation Name for your skill to start it, which for this example is "text mood bot". You can change that by going to the Build tab, clicking on JSON Editor and editing the "invocationName": field.

And that's all you need to get started! If you want to make your skill public, you'll need a more permanent hosting solution than ngrok and to need to submit your skill for certification.

Good luck building your integrations! If you have any questions, please don't hesitate to ask on the Rasa Forums.

I'd also like to highlight Rasa community members Shubham Verma, Heiko Dotzauer and Ishan Khatri for sharing their knowledge on the Rasa Forums. Thank you all for your contributions! 🙌🙌 --Rachael

Some other resources on building voice assistants with Rasa that you might find helpful:

[Rasa team] Going beyond 'Hey Google': building a Rasa-powered Google Assistant
[Rasa team] How to build a voice assistant with open source Rasa and Mozilla tools
[Rasa community] Building your own Duplex AI agent using Rasa and Twilio
[Rasa community] Connecting Rasa to Mycroft: A Guide
[Rasa community] User code example "echo2rasa" (uses intents provided by Alexa)