Version: Latest

Coexistence Routers

The coexistence of CALM and the NLU-based system depends on a routing mechanism, that routes messages based on their content to either system. You can choose between two different router components:

  1. IntentBasedRouter: The predicted intent of the NLU pipeline is used to decide where the message should go.
  2. LLMBasedRouter: This component leverages an LLM to decide whether a message should be routed to the NLU-based system or CALM.

You can only use one of the router components in your assistant.

IntentBasedRouter

The IntentBasedRouter uses the predicted intent from the NLU components and routes the message dependent on that intent. The router needs to be added to the pipeline in your config file.

important

The position of the IntentBasedRouter needs to be after the NLU components and before the Command Generators.

Depending on the other components you choose, your config file could look like the following.

config.yml
recipe: default.v1
language: en
pipeline:
- name: WhitespaceTokenizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: LogisticRegressionClassifier
- name: IntentBasedRouter
# additional configuration parameters
- name: SingleStepLLMCommandGenerator
llm:
model_group: openai_llm
policies:
- name: FlowPolicy
- name: RulePolicy
- name: MemoizationPolicy
max_history: 10
- name: TEDPolicy
endpoints.yml
model_groups:
- id: openai_llm
models:
- provider: openai
model: gpt-4
timeout: 7
max_tokens: 256

Configuration of the IntentBasedRouter

The following mandatory configuration parameters need to be configured:

  • nlu_entry:
    • sticky: List of intents which should route to the NLU-based system in a sticky fashion.
    • non_sticky: List of intents which should route to the NLU-based system in a non sticky fashion.
  • calm_entry:
    • sticky: List of intents which should route to CALM in a sticky fashion.

A full configuration of the IntentBasedRouter could for example look like the following.

config.yml
pipeline:
# ...
- name: IntentBasedRouter
nlu_entry:
sticky:
- transfer_money
- check_balance
- search_transactions
non_sticky:
- chitchat
calm_entry:
sticky:
- book_hotel
- cancel_hotel
- list_hotel_bookings
# ...
info

Once the IntentBasedRouter assigns the session to the NLU-based system, the LLMCommandGenerator is going to be skipped so that no unnecessary costs are incurred.

Handling missing intents

If an intent is predicted by an NLU component, but the intent is not part of any of the intents listed in the IntentBasedRouter and the routing session is currently not set, the message is routed according to the following rules:

  1. We route to CALM if given the intent any of the NLU triggers of flows are activated (see NLU triggers documentation).
  2. We route to the NLU-based system otherwise.

LLMBasedRouter

The LLMBasedRouter uses an LLM, by default gpt-3.5-turbo, to decide whether a message should be routed to the NLU-based system or CALM.

important

In order to use this component for your coexistence solution, you need to add it as the first component to your pipeline in the config file.

Depending on the other components you choose, your config file could look like the following.

config.yml
recipe: default.v1
language: en
pipeline:
- name: LLMBasedRouter
nlu_entry:
sticky: ...
non_sticky: ...
calm_entry:
sticky: handles everything around hotel bookings
llm:
model_group: openai_llm
# additional configuration parameters
- name: WhitespaceTokenizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: LogisticRegressionClassifier
- name: SingleStepLLMCommandGenerator
llm:
model_group: openai_llm
policies:
- name: FlowPolicy
- name: RulePolicy
- name: MemoizationPolicy
max_history: 10
- name: TEDPolicy
endpoints.yml
model_groups:
- id: openai_llm
models:
- provider: openai
model: gpt-4
timeout: 7
max_tokens: 256

Configuring the LLMBasedRouter component

The LLMBasedRouter component has the following configuration parameters:

  • nlu_entry:
    • sticky: Describes the general NLU-based system functionality. By default the value is "handles everything else".
    • non_sticky: Describes the functionality of the NLU-based system that should not result in sticky routing to the NLU-based system. By default the value is "handles chitchat".
  • calm_entry:
    • sticky (required): Describes the functionality implemented in the CALM system of the assistant.
  • llm: Configuration of the llm (see section).
  • prompt: File path to the prompt template (a jinja2 template) to use (see section)

If you want to use Azure OpenAI Service, you can configure the necessary parameters as described in the Azure OpenAI Service section.

Configuring the prompt of the LLMBasedRouter

By default, the descriptions of the configuration are assembled into a prompt describing a routing task to an LLM:

You have to forward the user message to the right assistant.
The following assistants are available:
Assistant A: {{ calm_entry_sticky }}
Assistant B: {{ nlu_entry_non_sticky }}
Assistant C: {{ nlu_entry_sticky }}
The user said: """{{ user_message }}"""
Answer which assistant needs to get this message:
The message is for the assistant with the letter

The configuration parameter nlu_entry.sticky goes into {{ nlu_entry_sticky }}, nlu_entry.non_sticky goes into {{ nlu_entry_non_sticky }}, and calm_entry.sticky goes into {{ calm_entry_sticky }}.

This prompt is much simpler and 10 times shorter than the prompt of the LLMCommandGenerator. Leveraging gpt-3.5-turbo for it, it is around 200 times cheaper than the LLMCommandGenerator using gpt-4.

You can modify the prompt by writing your own prompt as a jinja2 template and provide it to the component as a file:

pipeline:
# ...
- name: LLMBasedRouter
prompt: prompts/llm-based-router-prompt.jinja2
# ...
info

Once the LLMBasedRouter assigns the session to the NLU-based system, the LLMCommandGenerator is going to be skipped so that no unnecessary costs are incurred.

Configuring the LLM of the LLMBasedRouter

By default the following configuration for the LLM is used:

config.yml
pipeline:
# ...
- name: LLMBasedRouter
llm:
model_group: openai_llm
# ...
endpoints.yml
model_groups:
- id: openai_llm
models:
- provider: openai
model: "gpt-3.5-turbo"
timeout: 7
temperature: 0.0
max_tokens: 1
logit_bias:
"362": 100
"426": 100
"356": 100

The interesting settings here are:

  • max_tokens: 1 which lets the LLM only predict a single token.
  • the logit_bias which boosts probability of the LLM predicting the tokens " A", " B" or " C" respectively.
    • These are the tokens of the three capital letters with a space in front to predict the three cases the router presents to the LLM.
    • 362, 426, 356 are the token ids of these three tokens in the chatgpt and gpt-4 models.
info

If you change the model, you should also adjust these logit biases or at least remove them as to not boost the performance of unrelated tokens in other models!

Handling failures and downtime of the LLM

If the LLM predicts an invalid answer, e.g. another character than A, B, or C, or if the API of the LLM is down and the LLM cannot be reached, the message is routed to the NLU-based system in a sticky fashion.