In early 2016 I wrote that we don't know how to build conversational software yet. I'm writing a couple of posts to follow up on that one, taking one slice at a time discussing progress and open problems.
At Rasa we talk about 5 levels of conversational AI. An early version of this idea appeared in that first blog post, where I split dialogue systems into three levels: mapping each intent to a response without considering context, mapping each intent to a response while manually tracking the context with a bunch of rules, and level 3 where the representation of state and context is learned from data. Level 3 means moving beyond the endless "sorry, I didn't get that".
We've made a bunch of progress towards level 3 conversational AI. But to really get there, we have to stop relying on intents. I believe this is one of the most important barriers we need to overcome. I also believe that 2020 is the year we'll break free from intents.
To clarify, I'm not talking about removing intents from Rasa. Intents are helpful for getting started, but we're working to make sure that once your assistant becomes more sophisticated, you can move past the limitations imposed by mapping every message to one intent. Intents are rigid, limited, and don't account for context. With forms, retrieval actions, and multi-intents, Rasa has already evolved to remove some of those limitations. I want to discuss how those features blur the definition of what an intent is, and describe a path to a future where intents are no longer a bottleneck..
Example 1: forms and the kind-of-useless 'inform' intent
If your assistant uses Rasa forms you are probably relying on
inform, an intent that doesn't help to clarify the next action at all.
By introducing forms in Rasa, we provided a clean way to incorporate business logic into a dialogue engine while still learning from real data. Business logic is something you know up front and don't have to learn from user conversations. A form contains only the business logic that governs the happy path. For example, if a user asks about a refund, your refund form gets triggered and asks for a few pieces of information like the order number and the reason for the refund.
A sensible thing for a user to do (see e.g. this forum thread) is respond with just their order number:
What is the intent of that message? The order number is obviously an entity, but since we don't have a meaningful intent label for this, we tend to just call it
inform. But the label
inform is actually pointless, since it doesn't add any information beyond what the entity
order_number provides. Fortunately, by mapping this slot using the
from_entity method, the form works even if the intent is misclassified.
Example 2: Retrieval Intents are not really intents
Retrieval actions combine all of your FAQs into a single intent, blurring the definition of what an intent is. Retrieval actions (released in Rasa 1.3) let you collapse all stateless interactions into a single intent and action (stateless interactions are intents which should always receive the same response, like FAQs and basic chitchat). This is an effective simplification because from a dialogue perspective, all of these interactions are the same (imagine you have 2000+ FAQs like this forum user). Using a single intent for all stateless interactions makes your domain much simpler and means you need far fewer training stories.
But the retrieval actions also hint at the future: intents have to go away. The way retrieval actions work is that Rasa trains an extra machine learning model which does the response retrieval. For example in conversations like:
U: how are you?
B: I'm great thanks for asking
U: are you a bot?
B: yes I am a bot
the story is just:
In our training data we are mapping these intents to a single retrieval intent
- how are you?
- how's it going?
- are you a bot?
- am I talking to a computer?
and the responses are saved in a separate file, check out the tutorial for details.
The intent classifier and dialogue policy see all of these as a single
faq intent and
respond_faq action. Only the response retrieval model knows the difference between "how are you?" and "are you a bot?"
So what is actually the intent of 'how are you?'. Is it
faq or is it
faq/howdoing? We've blurred the definition of what an intent is, and that's a good thing. It doesn't matter which of these things we call the intent, what matters is that your assistant knows how to respond.
Example 3: Multi Intents
It's pretty common that your users genuinely say multiple things in a single message, and you want to capture both of those things (like this person in the Rasa forum). For example, a user might confirm a choice and ask a follow up question:
With multi-intents, we removed the limitation that each message can only have one intent. Rasa's NLU is pretty good at predicting those multi-intents, but the challenge, of course, is knowing how to respond. By design, Rasa will only predict multi intents that have been seen at least once in your training data. This follows our guiding principle that real conversations are more important than hypothetical ones.
Why don't we predict arbitrary multi-intents (a question also posed in the forum)? Say we were to predict new combinations of intents that were never seen before. There are multiple valid ways of dealing with multi-intent inputs:
- one of the intents can be ignored and you should act on the other one. (but which way around?)
- you need to respond to both intents (but in which order?)
- the intents conflict and you need to ask the user to clarify
You cannot know which of these is correct if this multi-intent has never been seen before. Instead of trying to write layers of rules for each of these cases, your dialogue model can learn from examples.
The model of one user message always mapping to one intent is limited, so we shipped multi-intents as a way of moving past that.
Example 4: Intents you haven't defined yet
The biggest issue with intents comes when you have a moderately complicated assistant and you're not sure how to annotate new messages that come in. I'm currently working on an assistant that encourages people to offset their carbon emissions. I see some messages like this coming in:
I have an intent defined for the question 'do you take a cut from the offsets I buy?', but this first question isn't quite the same. Should I create a new intent for this?
As another example, I also see some meta-conversation like:
It's pretty hard to invent an intent for that message and then think about where else that intent might be used. But it's pretty easy to pick a generic but good-enough response from the response templates I already have in my domain.
Why intents are great but have to go
Defining a set of intents is a super effective way of bootstrapping a conversational assistant. You are compressing the infinite space of things people can say into a few buckets, and that makes it much easier to know how to respond.
As your assistant grows in complexity, the opposite is true. As a human, it's easy to come up with a good response, but it becomes harder and harder to work with a rigid set of intents.
We need to move past intents to get to true level 3, and one of our goals for 2020 is to give Rasa users a path to getting there. I don't believe seq2seq models are the answer, nor should we be sampling from a big pre-trained language model. Rasa is for product teams shipping mission critical conversational AI, so models should always be deterministic and testable. In fact, with both Rasa Open Source and Rasa X we are doubling down on supporting good software engineering practices, with better support for end-to-end testing, CI/CD, performance monitoring, repeatable deployments, etc.
However, a lot of the pieces are there and I think we can build models that break free of this intent bottleneck.
I think intents are still a super helpful tool for getting started, but we're making sure that Rasa provides a path for moving away from them when your assistant becomes more mature and they become more of a hindrance than a help. I predict that by the end of 2020 the advanced teams will have moved away from using intents and will be much happier because of it.