We’re a step closer to getting rid of intents

One year ago I wrote that it's about time we get rid of intents, and that seems to have struck a nerve with many people working in conversational AI. I think everyone realizes that we'll never be able to build level 5 assistants if we're stuck in the mindset that every user message has to neatly fit into one of our predefined intents.

So, in Rasa 2.2, we've made intents optional by introducing end-to-end (e2e) learning. By end-to-end, we mean that instead of a two step process (an NLU prediction followed by a dialogue policy choosing the next action), Rasa can now directly predict the next action the assistant should take by looking at the message the user sent. This has been a massive joint effort from our research and engineering teams, and I believe it's a major piece of the puzzle towards better conversational AI.

A minimal assistant that goes beyond intents

As you work on an assistant over time to make it more sophisticated, end-to-end learning allows you to keep evolving and improving without being limited by a rigid set of intents. The benefit of end-to-end learning is that it makes intents optional - not every message needs to fit into one intent.

In the examples folder on Github, we've included a minimal example to show the motivation for introducing end-to-end. It's a very simple assistant, but it illustrates the point. When a user says they're looking for a restaurant, it suggests a random cuisine. If the user says "yes", the assistant responds with a list of suggestions. If they say "no", it proposes another random cuisine.

When you practice Conversation-Driven Development, you will often come across messages where the intended meaning is perfectly clear, but it's not obvious what intent they belong to.

Here's an example:

To a human reviewing this conversation, it's obvious that "I had that yesterday" means that the user would like an alternative. But what intent does "I had that yesterday" belong to? I wouldn't feel comfortable adding this as a training example to my "deny" intent. And if I have to create a new intent for this, what would I call it? Where else might users say that?

Here's another example:

"I can always go for sushi" fits reasonably well (though not perfectly) in a 'restaurant_search' intent. But in this context it simply means 'yes'. Then again, if the assistant had suggested a different cuisine, it would mean 'no'. Mapping every message to one intent has limits.

How end-to-end learning in Rasa works.

Below are the stories (training dialogues) included in this (very minimal) example They are, for the most part, normal Rasa 2.0 stories. The difference is that for some user turns, we specify what the user said (with the user key), and don't provide the intent at all. For each turn in the story, the intent is optional.

The nice thing about our implementation is that you can still use intents where they make sense, but you aren't constrained by them.

version: "2.0"

stories:
- story: accepts suggestion
  steps:
  - intent: greet
  - action: utter_greet
  - intent: search_restaurant
  - action: utter_suggest_cuisine
  - intent: affirm
  - action: utter_search_restaurants

- story: accepts 3rd suggestion
  steps:
  - intent: search_restaurant
  - action: utter_suggest_cuisine
  - intent: deny
  - action: utter_suggest_cuisine
  - intent: deny
  - action: utter_suggest_cuisine
  - intent: affirm
  - action: utter_search_restaurants

- story: accepts suggestion, indirectly
  steps:
  - intent: greet
  - action: utter_greet
  - intent: search_restaurant
  - action: utter_suggest_cuisine
  - user: "I can always go for sushi"
  - action: utter_search_restaurants

- story: rejects suggestion, indirectly
  steps:
  - intent: search_restaurant
  - action: utter_suggest_cuisine
 - user: "I had that yesterday"
  - action: utter_suggest_cuisine

Since most user turns in the stories above are still represented by an intent, we still need some NLU data. Here's what that looks like:

version: "2.0"

nlu:
- intent: greet
  examples: |
    - hey
    - hello
    - hi
    - hello there
    - good morning
    - good evening
    - moin
    - hey there
    - let's go
    - hey dude
    - goodmorning
    - goodevening
    - good afternoon

- intent: affirm
  examples: |
    - yes
    - y
    - indeed
    - of course
    - that sounds good
    - correct
    - yeah

- intent: deny
  examples: |
    - no
    - n
    - never
    - I don't think so
    - don't like that
    - no way

- intent: search_restaurant
  examples: |
    - I'm looking for some food
    - show me a place to eat
    - where should I eat tonight?
    - list restaurants
    - food
    - I'm hungry

When I train this assistant and talk to it, here's what happens when the user says I had that yesterday:

Our NLU model takes a best guess and predicts that "I had that yesterday" belongs to the intent 'search_restaurant'.
Because we have e2e training stories, our dialogue policy can also look directly at the user text to predict the next action.
The dialogue policy has decided that in this case, the predicted intent isn't useful information, and ignores it. It correctly predicts the next action (to suggest another cuisine) by looking directly at the user text.

The way this happens in practice is that the dialogue policy makes two predictions, one using only the intent and one using only the text, and chooses the one with higher confidence. Note that this is just one heuristic for choosing when to use the e2e prediction, and we're researching other approaches as well.

Contextual NLU

We're often asked if Rasa's NLU can be made contextual. That is, can the context of the conversation affect the intents and entities we predict?

If you want your NLU module to take context into account, you have to somehow include context into your NLU training data, or in a more brute-force approach you can implement some heuristics into a custom NLU component that overrides intents in certain cases.

End-to-end learning merges NLU and dialogue management into a single model, which is a cleaner and more robust solution to achieving contextual NLU.

If we want to contextually interpret "I had that yesterday" to mean "no" in this specific context, Rasa can now do that by correctly predicting the next action, without having to worry at all about what the correct intent label is for that message.

This means we are moving away from the paradigm that every message neatly fits into one intent, and that is a good thing.

Fully end-to-end assistants

What about getting rid of intents completely? In Rasa 2.2 you can also train fully end-to-end. This means that instead of predicting the name of the action your assistant has to execute (e.g. utter_greet ), your training data directly includes the text of the response ("hi!"). Notice in the story below the intent and action keys are both absent.

version: "2.0"

stories:
- story: end to end happy path
  steps:
  - user: "hi"
  - bot: "hi!"
  - user: "I'm looking for a restaurant"
  - bot: "how about Chinese food?"
  - user: "sure"
  - bot: "here's what I found ..."

This is powerful; it means you can train a Rasa model on a set of human-human dialogues, without having to first invent intent and action names and annotating conversations with them. I'm not suggesting right now that you build an assistant this way, but there's huge potential here that we're just starting to explore.

What's next

As of the 2.2 release of Rasa Open Source, end-to-end learning is an experimental feature. We don't have full support yet across all Rasa features, like interactive learning, or Rasa X.

For now, think of end-to-end learning as a feature for teams who want to push the limits of what Rasa can do. Please leave your feedback in this forum thread. As we learn about how to best use end-to-end in production systems, we'll build more tooling, provide more examples and docs, and turn this into a feature everyone can benefit from.

In late 2019 I predicted that "by the end of 2020 the advanced teams will have moved away from using intents". We've delivered on a major piece of the puzzle towards making that happen. I believe that in 2021 we will turn end-to-end from an experimental feature into something all conversational teams can run in production.