Skip to main content

Coexistence Routers

The coexistence of CALM and the NLU-based system depends on a routing mechanism, that routes messages based on their content to either system. You can choose between two different router components:

  1. IntentBasedRouter: The predicted intent of the NLU pipeline is used to decide where the message should go.
  2. LLMBasedRouter: This component leverages an LLM to decide whether a message should be routed to the NLU-based system or CALM.

You can only use one of the router components in your assistant.

IntentBasedRouter

The IntentBasedRouter uses the predicted intent from the NLU components and routes the message dependent on that intent. The router needs to be added to the pipeline in your config file.

important

The position of the IntentBasedRouter needs to be after the NLU components and before the Command Generators.

Depending on the other components you choose, your config file could look like the following.

config.yml
  recipe: default.v1
language: en
pipeline:
- name: WhitespaceTokenizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: LogisticRegressionClassifier
- name: IntentBasedRouter
# additional configuration parameters
- name: CompactLLMCommandGenerator
llm:
model_group: openai_llm

policies:
- name: FlowPolicy
- name: RulePolicy
- name: MemoizationPolicy
max_history: 10
- name: TEDPolicy
endpoints.yml
   model_groups:
- id: openai_llm
models:
- provider: openai
model: gpt-5-mini-2025-08-07
timeout: 7
max_tokens: 256

Configuration of the IntentBasedRouter

The following mandatory configuration parameters need to be configured:

  • nlu_entry:
    • sticky: List of intents which should route to the NLU-based system in a sticky fashion.
    • non_sticky: List of intents which should route to the NLU-based system in a non sticky fashion.
  • calm_entry:
    • sticky: List of intents which should route to CALM in a sticky fashion.

A full configuration of the IntentBasedRouter could for example look like the following.

config.yml
  pipeline:
# ...
- name: IntentBasedRouter
nlu_entry:
sticky:
- transfer_money
- check_balance
- search_transactions
non_sticky:
- chitchat
calm_entry:
sticky:
- book_hotel
- cancel_hotel
- list_hotel_bookings
# ...
info

Once the IntentBasedRouter assigns the session to the NLU-based system, the LLMCommandGenerator is going to be skipped so that no unnecessary costs are incurred.

Handling missing intents

If an intent is predicted by an NLU component, but the intent is not part of any of the intents listed in the IntentBasedRouter and the routing session is currently not set, the message is routed according to the following rules:

  1. We route to CALM if given the intent any of the NLU triggers of flows are activated (see NLU triggers documentation).
  2. We route to the NLU-based system otherwise.

LLMBasedRouter

The LLMBasedRouter uses an LLM, by default gpt-5-mini-2025-08-07, to decide whether a message should be routed to the NLU-based system or CALM.

important

In order to use this component for your coexistence solution, you need to add it as the first component to your pipeline in the config file.

Depending on the other components you choose, your config file could look like the following.

config.yml
  recipe: default.v1
language: en
pipeline:
- name: LLMBasedRouter
nlu_entry:
sticky: ...
non_sticky: ...
calm_entry:
sticky: handles everything around hotel bookings
llm:
model_group: openai_llm
# additional configuration parameters
- name: WhitespaceTokenizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: LogisticRegressionClassifier
- name: CompactLLMCommandGenerator
llm:
model_group: openai_llm

policies:
- name: FlowPolicy
- name: RulePolicy
- name: MemoizationPolicy
max_history: 10
- name: TEDPolicy
endpoints.yml
   model_groups:
- id: openai_llm
models:
- provider: openai
model: gpt-5-mini-2025-08-07
timeout: 7
max_tokens: 256

Configuring the LLMBasedRouter component

The LLMBasedRouter component has the following configuration parameters:

  • nlu_entry:
    • sticky: Describes the general NLU-based system functionality. By default the value is "handles everything else".
    • non_sticky: Describes the functionality of the NLU-based system that should not result in sticky routing to the NLU-based system. By default the value is "handles chitchat".
  • calm_entry:
    • sticky (required): Describes the functionality implemented in the CALM system of the assistant.
  • llm: Configuration of the llm (see section).
  • prompt: File path to the prompt template (a jinja2 template) to use (see section)

If you want to use Azure OpenAI Service, you can configure the necessary parameters as described in the Azure OpenAI Service section.

Configuring the prompt of the LLMBasedRouter

By default, the descriptions of the configuration are assembled into a prompt describing a routing task to an LLM:

router_template.jinja2
You have to forward the user message to the right assistant.

The following assistants are available:

Assistant A: {{ calm_entry_sticky }}
Assistant B: {{ nlu_entry_non_sticky }}
Assistant C: {{ nlu_entry_sticky }}

The user said: """{{ user_message }}"""

Answer which assistant needs to get this message.
Respond with exactly one character: A, B, or C.
Do not output any other words, punctuation, or explanation.

The configuration parameter nlu_entry.sticky goes into {{ nlu_entry_sticky }}, nlu_entry.non_sticky goes into {{ nlu_entry_non_sticky }}, and calm_entry.sticky goes into {{ calm_entry_sticky }}.

This prompt is much simpler and 10 times shorter than the prompt of the CompactLLMCommandGenerator. Using a compact chat model for routing keeps cost low compared to the full CompactLLMCommandGenerator stack.

You can modify the prompt by writing your own prompt as a jinja2 template and provide it to the component as a file:

pipeline:
# ...
- name: LLMBasedRouter
prompt: prompts/llm-based-router-prompt.jinja2
# ...
info

Once the LLMBasedRouter assigns the session to the NLU-based system, the LLMCommandGenerator is going to be skipped so that no unnecessary costs are incurred.

Configuring the LLM of the LLMBasedRouter

The router prompt asks the model to reply with a single letter, A, B, or C. The LLMBasedRouter parses that answer and maps it to the coexistence routing decision.

If you use models that support logit_bias—for example gpt-4o-mini—you can steer routing more tightly with max_tokens: 1 and a logit_bias map. max_tokens: 1 limits the completion to a single token; logit_bias increases the likelihood of exactly one of the tokens " A", " B", or " C" (space plus letter), matching the three assistants in the default prompt. On OpenAI’s APIs, GPT-4-series models (such as gpt-4o and gpt-4o-mini) accept max_tokens and logit_bias for this pattern; GPT-5-family models do not, so configure max_tokens / logit_bias only when you point the router at a compatible model. Rasa used to include these fields in built-in defaults when the default router model supported them. For many OpenAI chat models, illustrative bias entries looked like:

max_tokens: 1
logit_bias:
"362": 100
"426": 100
"356": 100

You must recompute token ids for your tokenizer (or drop logit_bias) whenever you change the model, or you risk boosting unrelated tokens and breaking routing.

Current default model: From Rasa Pro 3.16, the default router model is gpt-5-mini-2025-08-07 (GPT-5 family), which does not support logit_bias or max_tokens for this steering pattern. Rasa therefore no longer includes max_tokens or logit_bias in the built-in defaults for LLMBasedRouter. Routing still relies on the model returning A, B, or C; with GPT-5-family defaults, completion is steered by the prompt.

If your model supports logit_bias: You can still set max_tokens: 1 and logit_bias yourself for that model and provider.

info

Optional logit_bias is easy to misconfigure: wrong ids boost unrelated tokens and can break routing. Prefer relying on the prompt unless you have verified ids for your model.

Example configuration

First, use the outer tabs for where LLM settings are defined in your Rasa Pro release. Then use the inner tabs to match your model: GPT-5-mini reflects the current default stack (no logit_bias); GPT-4o-mini is an example of a model where you can optionally add max_tokens and logit_bias if you want that extra steering.

config.yml
   pipeline:
# ...
- name: LLMBasedRouter
llm:
model_group: openai_llm
# ...
endpoints.yml
   model_groups:
- id: openai_llm
models:
- provider: openai
model: "gpt-5-mini-2025-08-07"
timeout: 7
temperature: 1.0

Handling failures and downtime of the LLM

If the LLM predicts an invalid answer, e.g. another character than A, B, or C, or if the API of the LLM is down and the LLM cannot be reached, the message is routed to the NLU-based system in a sticky fashion.