Conversational AI with Language Models
Conversational AI with Language Models (CALM) is an LLM-native approach to building reliable conversational AI. It was developed at Rasa based on years of helping enterprise teams build customer-facing assistants. CALM was designed to give you the best of both worlds: the time-to-value, generality, and fluency of the latest generation of LLMs, and the robustness, reliability, control, and debuggability of NLU-based chatbots.
Also check out our YouTube playlist of videos to help you get started with Rasa and CALM.
If you use CALM in your research, please consider citing the paper.
If you're familiar with building NLU-based chatbots in Rasa or another platform, go to CALM compared to NLU-based assistants to understand how CALM differs.
If you're familiar with building LLM Agents and want to understand how that approach relates to CALM, go to CALM compared to LLM Agents.
How CALM works
The CALM approach has three key elements: Business Logic, Dialogue Understanding, and Automatic Conversation Repair.
Business logic is implemented as a set of flows. A flow describes a business process that your AI assistant can handle. It describes the information you need from the user, any data you need to retrieve from an API or a database, and any branching logic based on the information you collect. A flow only describes the logic your assistant will follow, it does not describe all the possible conversation paths.
Dialogue understanding is designed to interpret what end users are communicating to your assistant. This process involves generating commands that reflect the user's intentions, aligned with your business logic and the context of the ongoing conversation. There are commands for starting and stopping flows, for filling slots, and more. Commands are an internal grammar Rasa uses to navigate conversations. See the full list in the Command Reference.
Automatic conversation repair handles all the ways conversations can go "off script". For example:
- Your assistant asked for an email address, but the user says something else.
- The end user interrupts the current flow and switches context to another topic.
- The end user changes their mind about something they said earlier.
These cases and more are handled automatically, and you can fully customize every case.
CALM compared to NLU-based assistants
A big shift with CALM is we no longer rely on “NLU” models. Within conversational AI, NLU (Natural Language Understanding) describes processing user messages and predicting intents and entities to represent their meaning.
CALM uses a new approach called Dialogue Understanding (DU) that translates what users are saying into what that means for your business logic. This differs from a traditional NLU approach in three key ways:
- While NLU interprets one message in isolation, DU considers the greater context: the back-and-forth of the conversation and the assistant's business logic.
- Instead of producing intents and entities like NLU systems, DU outputs a sequence of commands representing how users want to progress the conversation.
- NLU systems are restricted to a fixed list of intents, whereas DU is generative and produces a sequence of commands according to an internal grammar. This gives us a far richer language to represent what users want.
Working with CALM is a lot faster than building an NLU-based assistant. CALM does not rely on intents, which is a move we've been excited about for years at Rasa. CALM also uses a primitive called Flows to define your dialogue logic. Flows are much simpler to work with than rules, forms, and stories, and only need to cover the happy paths. Work through the tutorial to get a feel for how to work with CALM.
CALM compared to LLM Agents
"LLM Agents" refers to idea of using LLMs as a reasoning engine. See the Reasoning and Acting (ReAct) framework for an example.
CALM uses an LLM to determine how the user wants to progress the conversation. It does not use an LLM to guess what the correct set of steps are to complete that process. This is the primary difference between the two approaches.
In CALM, business logic is described by a Flow
and executed precisely using the FlowPolicy
.
CALM uses an LLM to understand the user side of the conversation, but not the system side. In a ReAct-style agent, an LLM is used for both.
When each of these is appropriate:
LLM Agents | CALM |
---|---|
allow users to make open-ended use of tools / API endpoints | known business logic for a finite set of skills / user goals |
you have an effectively infinite set of possible tasks and open-ended goals | business logic needs to be strictly enforced |
you want to give the end-user of the bot full autonomy | limits to what end-users are allowed to do |
For business use cases, CALM has advantages over LLM Agents:
- Your business logic is explicitly written down, easily editable, and you can be sure that it's followed faithfully.
- Your business logic is already known, so you don't need to rely on an LLM to guess it. This avoids doing multiple LLM calls in series in response to just one user message, a method that's too slow for most applications.
- You can make your business logic as complex as desired and not worry about whether the LLM “forgets” a step.
- You can validate all the information users provide (i.e. every slot value) as the conversation progresses. You don’t have to wait until the information is collected and the API response gives you an error.
- End users cannot use prompt injection to override your business logic. A Language model as a reasoning engine is a fundamentally insecure proposition - see the OWASP top 10.
- Without the optional generative components of CALM, the LLM output is limited to a set of fixed commands, eliminating the risk of hallucination along with reducing latency and token generation cost.
Developer Edition
If you want to start building assistants with CALM you can get the Rasa Pro Developer Edition.