Coexistence Routers
The coexistence of CALM and the NLU-based system depends on a routing mechanism, that routes messages based on their content to either system. You can choose between two different router components:
IntentBasedRouter
: The predicted intent of the NLU pipeline is used to decide where the message should go.LLMBasedRouter
: This component leverages an LLM to decide whether a message should be routed to the NLU-based system or CALM.
You can only use one of the router components in your assistant.
IntentBasedRouter
The IntentBasedRouter
uses the predicted intent from the NLU components and routes the message dependent on that
intent.
The router needs to be added to the pipeline in your config file.
important
The position of the IntentBasedRouter
needs to be after the NLU components and before the Command Generators.
Depending on the other components you choose, your config file could look like the following.
Configuration of the IntentBasedRouter
The following mandatory configuration parameters need to be configured:
nlu_entry
:sticky
: List of intents which should route to the NLU-based system in a sticky fashion.non_sticky
: List of intents which should route to the NLU-based system in a non sticky fashion.
calm_entry
:sticky
: List of intents which should route to CALM in a sticky fashion.
A full configuration of the IntentBasedRouter
could for example look like the following.
info
Once the IntentBasedRouter
assigns the session to the NLU-based system, the LLMCommandGenerator
is going to be
skipped so that no unnecessary costs are incurred.
Handling missing intents
If an intent is predicted by an NLU component, but the
intent is not part of any of the intents listed in the IntentBasedRouter
and the routing session is currently not set,
the message is routed according to the following rules:
- We route to CALM if given the intent any of the NLU triggers of flows are activated (see NLU triggers documentation).
- We route to the NLU-based system otherwise.
LLMBasedRouter
The LLMBasedRouter
uses an LLM, by default gpt-3.5-turbo
, to decide whether a message should be routed to the
NLU-based system or CALM.
important
In order to use this component for your coexistence solution, you need to add it as the first component to your pipeline in the config file.
Depending on the other components you choose, your config file could look like the following.
Configuring the LLMBasedRouter component
The LLMBasedRouter
component has the following configuration parameters:
nlu_entry
:sticky
: Describes the general NLU-based system functionality. By default the value is"handles everything else"
.non_sticky
: Describes the functionality of the NLU-based system that should not result in sticky routing to the NLU-based system. By default the value is"handles chitchat"
.
calm_entry
:sticky
(required): Describes the functionality implemented in the CALM system of the assistant.
llm
: Configuration of the llm (see section).prompt
: File path to the prompt template (a jinja2 template) to use (see section)
If you want to use Azure OpenAI Service, you can configure the necessary parameters as described in the Azure OpenAI Service section.
Configuring the prompt of the LLMBasedRouter
By default, the descriptions of the configuration are assembled into a prompt describing a routing task to an LLM:
The configuration parameter nlu_entry.sticky
goes into {{ nlu_entry_sticky }}
, nlu_entry.non_sticky
goes into
{{ nlu_entry_non_sticky }}
, and calm_entry.sticky
goes into {{ calm_entry_sticky }}
.
This prompt is much simpler and 10 times shorter than the prompt of the
LLMCommandGenerator
.
Leveraging gpt-3.5-turbo
for it, it is around 200 times cheaper than the LLMCommandGenerator
using gpt-4
.
You can modify the prompt by writing your own prompt as a jinja2 template and provide it to the component as a file:
info
Once the LLMBasedRouter
assigns the session to the NLU-based system, the LLMCommandGenerator
is going to be
skipped so that no unnecessary costs are incurred.
Configuring the LLM of the LLMBasedRouter
By default the following configuration for the LLM is used:
- Rasa Pro <=3.10.x
- Rasa Pro >=3.11.x
The interesting settings here are:
max_tokens: 1
which lets the LLM only predict a single token.- the
logit_bias
which boosts probability of the LLM predicting the tokens" A"
," B"
or" C"
respectively.- These are the tokens of the three capital letters with a space in front to predict the three cases the router presents to the LLM.
- 362, 426, 356 are the token ids of these three tokens in the chatgpt and gpt-4 models.
info
If you change the model, you should also adjust these logit biases or at least remove them as to not boost the performance of unrelated tokens in other models!
Handling failures and downtime of the LLM
If the LLM predicts an invalid answer, e.g. another character than A
, B
, or C
, or if the API of the LLM is down
and the LLM cannot be reached, the message is routed to the NLU-based system in a sticky fashion.