LLM Command Generators

How an LLM-based Command Generator Works

The job of an LLM-based command generator is to ingest information about a conversation so far. It outputs a sequence of commands that represent how the user wants to progress the conversation.

For example, if you defined a flow called transfer_money, and a user starts a conversation by saying "I need to transfer some money", the correct command output would be StartFlow("transfer_money").

If you asked the user a yes/no question (using a collect step) and they say "yes.", the correct command output is SetSlot(slot_name, True).

If the user answers the question but also requests something new, like "yes. Oh what's my balance?", the command output might be [SetSlot(slot_name, True), StartFlow("check_balance")].

By generating a sequence of commands, Dialogue Understanding is a better way to represent what the user wants than a classification-based NLU system.

The LLM-based command generators also use a flow retrieval sub-module to ensure the input context size does not linearly scale up with the size of the assistant.

Interaction With Other Types of Command Generators

New in 3.12

Prior to the 3.12.0 release, once one or more commands were issued by the NLUCommandAdapter, the LLM-based command generator would be blocked from issuing commands. This restriction is no longer in place. This improvement effectively allows for a seamless conversational UX where, for example, the LLM-based command generator can fill slots that the NLUCommandAdapter could not fill.

Note that the config pipeline can only have one LLM-based command generators. When you're using both an LLM-based command generator and the `NLUCommandAdapter in the config pipeline, each of these command generators can issue commands at any given conversation turn. For example, consider the following scenarios:

NLUCommandAdapter issues only a StartFlow command, but the user message contains info on slots could be filled. The LLM-based command generator can now issue SetSlot commands to fill these slots.
The user is prompted to fill a slot and responds with info to fill both the requested slot and other slots. While the NLUCommandAdapter is able to fill only the requested slot, the other slots can now be captured by LLM-based command generators.

Minimizing The Number of LLM Invocations

This parameter instructs the LLM-based command generator to skip the LLM invocation if one of the above scenarios is true: the NLUCommandAdapter has already issued a StartFlow command or a SetSlot command for the active collect flow step. This behaviour of minimizing the number of LLM invocations is enabled by default.

If you would prefer to disable it, please set this parameter to false in the LLM-based command generator configuration.

config.yml
pipeline:
  - name: CompactLLMCommandGenerator
    minimize_num_calls: false

Prioritization of Commands

If both LLM-based command generators and the NLUCommandAdapter issue commands to fill the same slot or start different flows, the following priority ranking is applied:

in case of different StartFlow commands issued for different flow names, the command issued by the NLUCommandAdapter is prioritized, while the command issued by the LLM-based command generator is ignored.
in case of different SetSlot commands issued for the same slot name, the command issued by the NLUCommandAdapter is prioritized, while the command issued by the LLM-based command generator is ignored.

Note that the order of the command generators in the pipeline does not affect the priority ranking. Additionally, the NLUCommandAdapter can now be placed both before or after the LLM-based command generators in the pipeline.

Types of LLM-based Command Generators

The latest and recommended LLM-based commands generators are the SearchReadyLLMCommandGenerator and the CompactLLMCommandGenerator.

If you are relying on the LLM-based commands generator to trigger RAG, for example via EnterpriseSearchPolicy, then use the SearchReadyLLMCommandGenerator, otherwise we recommend to use the CompactLLMCommandGenerator.

The SingleStepLLMCommandGenerator and MultiStepLLMCommandGenerator are previous versions of LLM-based command generators. Both are deprecated and will be removed in Rasa 4.0.0.

To use a CommandGenerator in your AI assistant, add one of the following components to the NLU pipeline in your config.yml file:

SearchReadyLLMCommandGenerator
CompactLLMCommandGenerator

Read more about the config.yml file here and how to configure LLM models here

config.yml
pipeline:
# - ...
  - name: SearchReadyLLMCommandGenerator
# - ...

The SearchReadyLLMCommandGenerator and CompactLLMCommandGenerator require access to an LLM API. These Command Generators are LLM-agnostic and can be configured with any LLM that supports the /chat endpoint, such as GPT-4o from OpenAI or Claude 3.5 Sonnet from Anthropic. We are working on expanding the list of supported models and model providers. More information about recommended models can be found here.

SearchReadyLLMCommandGenerator

SearchReadyLLMCommandGenerator is recommended, if you are relying on an LLM-based command generator to trigger RAG, for example, via EnterpriseSearchPolicy. It is optimized for high-performance LLMs, such as GPT-4o.

To interpret the user's message in context, the current implementation of the SearchReadyLLMCommandGenerator uses in-context learning, information about the current state of the conversation, and flows defined in your assistant. Descriptions and slot definitions for each flow are included in the prompt as relevant information. However, to scale to a large number of flows, the LLM-based command generator includes only the flows that are relevant to the current state of the conversation, see flow retrieval.

The SearchReadyLLMCommandGenerator improves the triggering accuracy of the KnowledgeAnswerCommand compared to the CompactLLMCommandGenerator. The KnowledgeAnswerCommand is used to trigger the pattern_search, which can be used to start EnterpriseSearchPolicy. The SearchReadyLLMCommandGenerator is designed to prioritize flows over knowledge questions and small talk.

Prompt Template

The default prompt template serves as a dynamic framework enabling the SearchReadyLLMCommandGenerator to render prompts. The template consists of a static component, as well as dynamic components that get filled in when rendering a prompt:

Current state of the conversation - This part of the template captures the ongoing dialogue.
Defined flows and slots - This part of the template provides the context and structure for the conversation. It outlines the overarching theme, guiding the model's understanding of the conversation's purpose.
Active flow and slot - Active elements within the conversation that require the model's attention.

SearchReadyLLMCommandGenerator includes two optimized prompt templates tailored for:

GPT-4o (gpt-4o-2024-11-20)
Claude 3.5 Sonnet (claude-3-5-sonnet-20240620 on Anthropic API and anthropic.claude-3-5-sonnet-20240620-v1:0 on Bedrock API)

If one of these models is used, the corresponding prompt template is selected automatically. For any other LLM, the GPT-4o prompt template is used by default.

Both prompt templates are based on the same structure: markdown-formatted text with structured data in JSON format. They differ in ordering of the sections and the textual descriptions of the actions.

GPT-4o
Claude 3.5 Sonnet

The following prompt template is optimized for the gpt-4o-2024-11-20 model and will be used by default for any model except claude-3-5-sonnet-20240620 / anthropic.claude-3-5-sonnet-20240620-v1:0:

## Task Description
Your task is to analyze the current conversation context and generate a list of actions to start new business processes that we call flows, to extract slots, or respond to off-topic and knowledge requests.

---

## Available Flows and Slots
Use the following structured data:
```json
{"flows":[{% for flow in available_flows %}{"name":"{{ flow.name }}","description":{{ flow.description | to_json_escaped_string }}{% if flow.slots %},"slots":[{% for slot in flow.slots %}{"name":"{{ slot.name }}"{% if slot.description %},"description":{{ slot.description | to_json_escaped_string }}{% endif %}{% if slot.allowed_values %},"allowed_values":{{ slot.allowed_values }}{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]}
```

---

## Available Actions:
* `start flow flow_name`: Start a flow. For example, `start flow transfer_money` or `start flow list_contacts`.
* `set slot slot_name slot_value`: Set a slot for the active flow. For example, `set slot transfer_money_recipient Freddy`. Can be used to correct and change previously set values.
* `disambiguate flows flow_name1 flow_name2 ... flow_name_n`: When a message could refer to multiple flows, list the possible flows as options to clarify. Example: `disambiguate flows list_contacts add_contact remove_contact`.
* `search and reply`: Provide a response from the knowledge base to address the user’s inquiry when no flows fit, including domain knowledge, FAQs, and all off-topic or social messages.
* `cancel flow`: Cancel the current flow if the user requests it.

---

## General Instructions
### Start Flow
* Only start a flow if the user's message is clear and fully addressed by that flow's description and purpose.
* Pay close attention to exact wording and scope in the flow description — do not assume or “stretch” the intended use of a flow.
### Set Slot
* Do not fill slots with abstract values or placeholders.
* For categorical slots try to match the user message with allowed slot values. Use "other" if you cannot match it.
* Set the boolean slots based on the user response. Map positive responses to `True`, and negative to `False`.
* Extract text slot values exactly as provided by the user. Avoid assumptions, format changes, or partial extractions.
### Disambiguate Flows
* Use `disambiguate flows` when the user's message matches multiple flows and you cannot decide which flow is most appropriate.
* If the user message is short and not precise enough to start a flow or `search and reply`, disambiguate.
* If a single flow is a strong/plausible fit, prefer starting that flow directly.
* If a user's message unambiguously and distinctly matches multiple flows, start all relevant flows at once (rather than disambiguating).
### Search and Reply
* Only start `search and reply` if the user intent is clear.
* Flow Priority: If you are unsure between starting a flow or `search and reply`, always prioritize starting a flow.
### Cancel Flow
* Do not cancel any flow unless the user explicitly requests it.
* Multiple flows can be started without cancelling the previous, if the user wants to pursue multiple processes.
### General Tips
* Only use information provided by the user.
* Strictly adhere to the provided action format.
* Focus on the last message and take it one step at a time.
* Use the previous conversation steps only to aid understanding.

---

## Decision Rule Table
| Condition                                             | Action             |
|-------------------------------------------------------|--------------------|
| Flow perfectly matches user's message                 | start flow         |
| Multiple flows are equally strong, relevant matches   | disambiguate flows |
| User's message is unclear or imprecise                | disambiguate flows |
| No flow fits at all, but knowledge base may help      | search and reply   |

---

## Current State
{% if current_flow != None %}Use the following structured data:
```json
{"active_flow":"{{ current_flow }}","current_step":{"requested_slot":"{{ current_slot }}","requested_slot_description":{{ current_slot_description | to_json_escaped_string }}},"slots":[{% for slot in flow_slots %}{"name":"{{ slot.name }}","value":"{{ slot.value }}","type":"{{ slot.type }}"{% if slot.description %},"description":{{ slot.description | to_json_escaped_string }}{% endif %}{% if slot.allowed_values %},"allowed_values":"{{ slot.allowed_values }}"{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]}
```{% else %}
You are currently not inside any flow.{% endif %}

---

## Conversation History
{{ current_conversation }}

---

## Task
Create an action list with one action per line in response to the user's last message: """{{ user_message }}""".

Your action list:

The prompt template for the claude-3-5-sonnet-20240620 / anthropic.claude-3-5-sonnet-20240620-v1:0 model is as follows:

## Task Description
Your task is to analyze the current conversation context and generate a list of actions to start new business processes that we call flows, to extract slots, or respond to off-topic and knowledge requests.

---

## Available Actions:
* `start flow flow_name`: Start a flow. For example, `start flow transfer_money` or `start flow list_contacts`.
* `set slot slot_name slot_value`: Set a slot for the active flow. For example, `set slot transfer_money_recipient Freddy`. Can be used to correct and change previously set values.
* `disambiguate flows flow_name1 flow_name2 ... flow_name_n`: When a message could refer to multiple flows, list the possible flows as options to clarify. Example: `disambiguate flows list_contacts add_contact remove_contact`.
* `search and reply`: Provide a response from the knowledge base to address the user's inquiry when no flows fit, including domain knowledge, FAQs, and all off-topic or social messages.
* `cancel flow`: Cancel the current flow if the user requests it.

---

## General Instructions
### Start Flow
* Only start a flow if the user's message is clear and fully addressed by that flow's description and purpose.
* Pay close attention to exact wording and scope in the flow description — do not assume or “stretch” the intended use of a flow.
### Set Slot
* Do not fill slots with abstract values or placeholders.
* For categorical slots, try to match the user message with allowed slot values. Use "other" if you cannot match it.
* Set the boolean slots based on the user's response. Map positive responses to `True`, and negative to `False`.
* Extract text slot values exactly as provided by the user. Avoid assumptions, format changes, or partial extractions.
### Disambiguate Flows
* Use `disambiguate flows` when the user's message matches multiple flows and you cannot decide which flow is most appropriate.
* If the user message is short and not precise enough to start a flow or `search and reply`, disambiguate.
* If a single flow is a strong/plausible fit, prefer starting that flow directly.
* If a user's message unambiguously and distinctly matches multiple flows, start all relevant flows at once (rather than disambiguating).
### Search and Reply
* Only start `search and reply` if the user intent is clear.
* Flow Priority: If you are unsure between starting a flow or `search and reply`, always prioritize starting a flow.
### Cancel Flow
* Do not cancel any flow unless the user explicitly requests it.
* Multiple flows can be started without cancelling the previous, if the user wants to pursue multiple processes.
### General Tips
* Only use information provided by the user.
* Strictly adhere to the provided action format.
* Focus on the last message and take it one step at a time.
* Use the previous conversation steps only to aid understanding.

---

## Decision Rule Table
| Condition                                             | Action             |
|-------------------------------------------------------|--------------------|
| Flow perfectly matches user's message                 | start flow         |
| Multiple flows are equally strong, relevant matches   | disambiguate flows |
| User's message is unclear or imprecise                | disambiguate flows |
| No flow fits at all, but knowledge base may help      | search and reply   |

---

## Available Flows and Slots
Use the following structured data:
```json
{"flows":[{% for flow in available_flows %}{"name":"{{ flow.name }}","description":{{ flow.description | to_json_escaped_string }}{% if flow.slots %},"slots":[{% for slot in flow.slots %}{"name":"{{ slot.name }}"{% if slot.description %},"description":{{ slot.description | to_json_escaped_string }}{% endif %}{% if slot.allowed_values %},"allowed_values":{{ slot.allowed_values }}{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]}
```

---

## Current State
{% if current_flow != None %}Use the following structured data:
```json
{"active_flow":"{{ current_flow }}","current_step":{"requested_slot":"{{ current_slot }}","requested_slot_description":{{ current_slot_description | to_json_escaped_string }}},"slots":[{% for slot in flow_slots %}{"name":"{{ slot.name }}","value":"{{ slot.value }}","type":"{{ slot.type }}"{% if slot.description %},"description":{{ slot.description | to_json_escaped_string }}{% endif %}{% if slot.allowed_values %},"allowed_values":"{{ slot.allowed_values }}"{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]}
```{% else %}
You are currently not inside any flow.{% endif %}

---

## Conversation History
{{ current_conversation }}

---

## Task
Create an action list with one action per line in response to the user's last message: """{{ user_message }}""".

Your action list:

Customization

You can customize the SearchReadyLLMCommandGenerator as much as you wish. General customization options that are available for all LLMCommandGenerators are listed in the section General Customizations.

Customizing the Prompt Template

If you cannot get something to work via editing the flow and slot descriptions (see section customizing the prompt), you can go one level deeper and customise the prompt template used to drive the SearchReadyLLMCommandGenerator. To do this, write your own prompt as a jinja2 template and provide it to the component as a file:

config.yml
pipeline:
  - name: SearchReadyLLMCommandGenerator
    prompt_template: prompts/command-generator.jinja2

CompactLLMCommandGenerator

CompactLLMCommandGenerator is the default and one of the recommended LLM-based command generator. It is optimized for high-performance LLMs, such as GPT-4o.

To interpret the user's message in context, the current implementation of the CompactLLMCommandGenerator uses in-context learning, information about the current state of the conversation, and flows defined in your assistant. Descriptions and slot definitions of each flow are included in the prompt as relevant information. However, to scale to a large number of flows, the LLM-based command generator includes only the flows that are relevant to the current state of the conversation, see flow retrieval.

Prompt Template

The default prompt template serves as a dynamic framework enabling the CompactLLMCommandGenerator to render prompts. The template consists of a static component, as well as dynamic components that get filled in when rendering a prompt:

Current state of the conversation - This part of the template captures the ongoing dialogue.
Defined flows and slots - This part of the template provides the context and structure for the conversation. It outlines the overarching theme, guiding the model's understanding of the conversation's purpose.
Active flow and slot - Active elements within the conversation that require the model's attention.

CompactLLMCommandGenerator includes two optimized prompt templates tailored for:

GPT-4o (gpt-4o-2024-11-20)
Claude 3.5 Sonnet (claude-3-5-sonnet-20240620 on Anthropic API and anthropic.claude-3-5-sonnet-20240620-v1:0 on Bedrock API)

If one of these models is used, the corresponding prompt template is selected automatically. For any other LLM, the GPT-4o prompt template is used by default.

GPT-4o
Claude 3.5 Sonnet

## Task Description
Your task is to analyze the current conversation context and generate a list of actions to start new business processes that we call flows, to extract slots, or respond to small talk and knowledge requests.

---

## Available Flows and Slots
Use the following structured data:
```json
{"flows":[{% for flow in available_flows %}{"name":"{{ flow.name }}","description":"{{ flow.description }}"{% if flow.slots %},"slots":[{% for slot in flow.slots %}{"name":"{{ slot.name }}"{% if slot.description %},"description":"{{ slot.description }}"{% endif %}{% if slot.allowed_values %},"allowed_values":{{ slot.allowed_values }}{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]}
```

---

## Available Actions:
* `start flow flow_name`: Starting a flow. For example, `start flow transfer_money` or `start flow list_contacts`.
* `set slot slot_name slot_value`: Slot setting. For example, `set slot transfer_money_recipient Freddy`. Can be used to correct and change previously set values.
* `cancel flow`: Cancelling the current flow.
* `disambiguate flows flow_name1 flow_name2 ... flow_name_n`: Disambiguate which flow should be started when user input is ambiguous by listing the potential flows as options. For example, `disambiguate flows list_contacts add_contact remove_contact ...` if the user just wrote "contacts".
* `provide info`: Responding to the user's questions by supplying relevant information, such as answering FAQs or explaining services.
* `offtopic reply`: Responding to casual or social user messages that are unrelated to any flows, engaging in friendly conversation and addressing off-topic remarks.
* `hand over`: Handing over to a human, in case the user seems frustrated or explicitly asks to speak to one.

---

## General Tips
* Do not fill slots with abstract values or placeholders.
* For categorical slots try to match the user message with allowed slot values. Use "other" if you cannot match it.
* Set the boolean slots based on the user response. Map positive responses to `True`, and negative to `False`.
* Extract text slot values exactly as provided by the user. Avoid assumptions, format changes, or partial extractions.
* Only use information provided by the user.
* Use clarification in ambiguous cases.
* Multiple flows can be started. If a user wants to digress into a second flow, you do not need to cancel the current flow.
* Do not cancel the flow unless the user explicitly requests it.
* Strictly adhere to the provided action format.
* Focus on the last message and take it one step at a time.
* Use the previous conversation steps only to aid understanding.

---

## Current State
{% if current_flow != None %}Use the following structured data:
```json
{"active_flow":"{{ current_flow }}","current_step":{"requested_slot":"{{ current_slot }}","requested_slot_description":"{{ current_slot_description }}"},"slots":[{% for slot in flow_slots %}{"name":"{{ slot.name }}","value":"{{ slot.value }}","type":"{{ slot.type }}"{% if slot.description %},"description":"{{ slot.description }}"{% endif %}{% if slot.allowed_values %},"allowed_values":"{{ slot.allowed_values }}"{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]}
```{% else %}
You are currently not inside any flow.{% endif %}

---

## Conversation History
{{ current_conversation }}

---

## Task
Create an action list with one action per line in response to the user's last message: """{{ user_message }}""".

Your action list:

The prompt template for the claude-3-5-sonnet-20240620 / anthropic.claude-3-5-sonnet-20240620-v1:0 model is as follows:

## Task Description
Your task is to analyze the current conversation context and generate a list of actions to start new business processes that we call flows, to extract slots, or respond to small talk and knowledge requests.

--

## Available Actions:
* `start flow flow_name`: Starting a flow. For example, `start flow transfer_money` or `start flow list_contacts`.
* `set slot slot_name slot_value`: Slot setting. For example, `set slot transfer_money_recipient Freddy`. Can be used to correct and change previously set values.
* `cancel flow`: Cancelling the current flow.
* `disambiguate flows flow_name1 flow_name2 ... flow_name_n`: Disambiguate which flow should be started when user input is ambiguous by listing the potential flows as options. For example, `disambiguate flows list_contacts add_contact remove_contact ...` if the user just wrote "contacts".
* `provide info`: Responding to the user's questions by supplying relevant information, such as answering FAQs or explaining services.
* `offtopic reply`: Responding to casual or social user messages that are unrelated to any flows, engaging in friendly conversation and addressing off-topic remarks.
* `hand over`: Handing over to a human, in case the user seems frustrated or explicitly asks to speak to one.

--

## General Tips
* Do not fill slots with abstract values or placeholders.
* For categorical slots try to match the user message with allowed slot values. Use "other" if you cannot match it.
* Set the boolean slots based on the user response. Map positive responses to `True`, and negative to `False`.
* Always refer to the slot description to determine what information should be extracted and how it should be formatted.
* For text slots, extract values exactly as provided by the user unless the slot description specifies otherwise. Preserve formatting and avoid rewording, truncation, or making assumptions.
* Only use information provided by the user.
* Use clarification in ambiguous cases.
* Multiple flows can be started. If a user wants to digress into a second flow, you do not need to cancel the current flow.
* Do not cancel the flow unless the user explicitly requests it.
* Strictly adhere to the provided action format.
* Focus on the last message and take it one step at a time.
* Use the previous conversation steps only to aid understanding.

--

## Available Flows and Slots
Use the following structured data:
```json
{"flows":[{% for flow in available_flows %}{"name":"{{ flow.name }}","description":"{{ flow.description }}"{% if flow.slots %},"slots":[{% for slot in flow.slots %}{"name":"{{ slot.name }}"{% if slot.description %},"description":"{{ slot.description }}"{% endif %}{% if slot.allowed_values %},"allowed_values":{{ slot.allowed_values }}{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]}
```

--

## Current State
{% if current_flow != None %}Use the following structured data:
```json
{"active_flow":"{{ current_flow }}","current_step":{"requested_slot":"{{ current_slot }}","requested_slot_description":"{{ current_slot_description }}"},"slots":[{% for slot in flow_slots %}{"name":"{{ slot.name }}","value":"{{ slot.value }}","type":"{{ slot.type }}"{% if slot.description %},"description":"{{ slot.description }}"{% endif %}{% if slot.allowed_values %},"allowed_values":"{{ slot.allowed_values }}"{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]}
```{% else %}
You are currently not inside any flow.{% endif %}

---

## Conversation History
{{ current_conversation }}

---

## Task
Create an action list with one action per line in response to the user's last message: """{{ user_message }}""".

Your action list:

Customization

You can customize the CompactLLMCommandGenerator as much as you wish. General customization options that are available for all LLMCommandGenerators are listed in the section General Customizations.

Customizing the Prompt Template

If you cannot get something to work via editing the flow and slot descriptions (see section customizing the prompt), you can go one level deeper and customise the prompt template used to drive the CompactLLMCommandGenerator. To do this, write your own prompt as a jinja2 template and provide it to the component as a file:

config.yml
pipeline:
  - name: CompactLLMCommandGenerator
    prompt_template: prompts/command-generator.jinja2

Retrieving Relevant Flows

As your assitant's skill set evolves, the number of functional flows will likely expand into the hundreds. However, due to the constraints of the LLM's context window, it is impractical to present all these flows simultaneously to the LLM. To ensure efficiency, only a subset of flows that are relevant to a given conversation will be considered. We implement a flow retrieval mechanism to identify and filter the most relevant flows for the command generator. This targeted selection helps in crafting effective prompts that are within the limit of the LLM's context window.

LLM Context Window Limitation

The LLM-based command generator operates within the confines of a predefined context window of the underlying model, which limits the volume of text it can process at one time. This window encompasses all the text the model can "view" and utilize for decision-making (the prompt) and response generation (the output).

warning

The flow retrieval mechanism only filters the relevant flows, while the reasoning and decision on how to proceed (given the flows identified as relevant) lies with the command generator.

The ability to retrieve relevant flows has a training component attached to it. During training, all defined flows with flow guards potentially evaluating to true are transformed into documents containing flow descriptions and (optionally) slot descriptions and allowed slot values. These documents are then transformed into vectors using the embedding model and stored in a vector store.

When talking to the assistant, i.e. during inference, the current conversation context is transformed into a vector and compared against the flows in the vector store. This comparison identifies the flows that are most similar to the current conversation context and includes them into the prompt of the CompactLLMCommandGenerator and the SearchReadyLLMCommandGenerator.

However, additional rules are applied to select or discard certain flows:

Any flow with a flow guard evaluating to False is excluded.
Any flow marked with the always_include_in_prompt property set to true is always included, provided that the flow guard(if defined) evaluates to true.
All flows that are active during the current conversation context are always included.

This feature of retrieving only the relevant flows and including them in the prompt is enabled by default. Read more about configuring the options here.

The performance of the flow retrieval depends on the quality of flow descriptions. Good descriptions improve the differentiation among flows covering similar topics but also boost the alignment between the intended user actions and the flows. For tips on how to write good descriptions, you can check out our guidelines.

General Customization

The following customizations are available for the SearchReadyLLMCommandGenerator and the CompactLLMCommandGenerator.

LLM configuration

To specify the OpenAI model to use for the SearchReadyLLMCommandGenerator or the CompactLLMCommandGenerator, set the llm.model_group property in the config.yml file:

config.yml
pipeline:
# - ...
  - name: SearchReadyLLMCommandGenerator
    llm:
      model_group: gpt-4o-openai-model
# - ...

endpoints.yml
model_groups:
  - id: gpt-4o-openai-model
    models:
      - provider: openai
        model: gpt-4o-2024-11-20
        timeout: 7
        temperature: 0.0

The model defaults to gpt-4o-2024-11-20 for the SearchReadyLLMCommandGenerator and the CompactLLMCommandGenerator. The model name should be set to a chat model of OpenAI.

Similarly, you can specify the timeout and temperature parameters for the LLM. The timeout defaults to 7 seconds and the temperature defaults to 0.0.

Deprecated model configuration

The llm.model property is deprecated and will be removed in Rasa 4.0.0. If you are using Rasa pro versions <=3.10.x, refer to the LLM Configuration page.

If you want to use Azure OpenAI Service, configure the necessary parameters as described in the Azure OpenAI Service section.

Using Other LLMs

By default, OpenAI is used as the underlying LLM provider.

The LLM provider you want to use can be configured in the config.yml file. To use another provider, like cohere:

config.yml
pipeline:
# - ...
  - name: SearchReadyLLMCommandGenerator
    llm:
      model_group: cohere-model
# - ...

endpoints.yml
model_groups:
  - id: cohere-model
    models:
      - provider: cohere
        model: ...

For more information, see the LLM setup page on llms and embeddings

Customizing The Prompt

Because the LLMCommandGenerators use in-context learning, one of the primary ways to tweak or improve performance is to customize the prompt.

In most cases, you can achieve what you need by customizing the description fields in your flows. Every flow has its own description field; optionally, every step in your flow can also have one. If you notice a flow is triggered when it shouldn't, or a slot is not extracted correctly, adding more detail to the description will often solve the issue.

For example, if you have a transfer_money flow with a collect step for the slot amount, you can add a description to extract the value more reliably:

flows.yml
flows:
  transfer_money:
    description: |
      This flow lets users send money to friends
      and family, in US Dollars.
    steps:
      - collect: recipient
      - collect: amount
        description: the amount of money to send. extract only the numerical value, ignoring the currency.

Best Practices for Descriptions

Use the following guidelines to write informative and contextually rich flow descriptions.

Provide information-dense descriptions: Ensure flow descriptions are precise and informative, directly outlining the flow's purpose and scope. Aim for a balance between brevity and the density of information, using imperative language and avoiding unnecessary words to prevent ambiguity. The goal is to convey essential information as clearly as possible.
Use clear and standard language: Avoid unusual phrasing or choice of words. Stick to clear, universally understood language.
Explicitly define context: Explicitly define the flow context to increase the models situational awareness. The embedding models used for retrieving only the relevant flows lacks situational awareness. It can't figure out the context or read between the lines beyond what's directly described in the flow.
Clarify implicit knowledge: Clarify any specialized knowledge in descriptions (e.g. if there are brand names mentioned: what is brand domain; if the product name is mentioned: what is the product about). The embedding model that is used for retrieving only the relevant flows is unlikely to produce good embeddings regarding brands and their products.
(Optional) Adding example user utterances: While strictly not required, adding example user utterances can add more context to the flow descriptions. This can also ensure that the embeddings will closely match the user inputs. This should be considered more as a remedy, rather than a cure. If user utterances improve performance, it suggests they provide new information that could be directly incorporated into flow descriptions.

Customizing flow retrieval

The ability to retrieve only the relevant flows for inclusion in the prompt at inference time is activated by default. To configure it, you can modify the settings under the flow_retrieval property. The default configuration uses text-embedding-3-large embedding model from OpenAI:

config.yml
pipeline:
  - name: SearchReadyLLMCommandGenerator
    ...
    flow_retrieval:
      embeddings:
        model_group: openai_text_embedding
      ...
  ...

endpoints.yml
model_groups:
  - id: openai_text_embedding
    models:
      - provider: openai
        model: text-embedding-3-large
        ...

You can adjust the embedding provider and model. More on supported embeddings and how to configure those can be found here.

Additionally, you can also configure:

turns_to_embed - The number of conversation turns to be transformed into a vector and compared against the flows in the vector store. Setting the value to 1 means that only the latest conversation turn is used. Increasing the number of turns expands the conversation context window.
should_embed_slots - Whether to embed the slot descriptions along with the flow description during training (True / False).
num_flows - The maximum number of flows to be retrieved from the vector store.

Below is a configuration with default values:

config.yml
pipeline:
  - name: SearchReadyLLMCommandGenerator
    ...
    flow_retrieval:
      turns_to_embed: 1
      should_embed_slots: true
      num_flows: 20
    ...

Number of retrieved flows

The number of flows specified by num_flows does not directly correspond to the actual number of flows included into the prompt. The total number of included flows also depends on the flows marked as always_include_in_prompt and those previously active. For more information, check the Retrieving Relevant Flows section.

The flow retrieval can also be disabled by setting the flow_retrieval.active field to false:

config.yml
pipeline:
  - name: SearchReadyLLMCommandGenerator
    ...
    flow_retrieval:
      active: false
    ...

warning

Disabling the ability to retrieve only the flows that are relevant to the current conversation context will restrict the command generator's capacity to manage a large number of flows. Due to the command generator's limited prompt size, exceeding this limit will lead to its inability to create effective commands, leaving the assistant unable to provide meaningful responses to user requests. Additionally, a high number of tokens in the prompt can result in increased costs and latency, further impacting the responsiveness of the system.

Customizing the maximum length of user input

To restrict the length of user messages, set the user_input.max_characters (default value 420 characters).

config.yml
pipeline:
  - name: SearchReadyLLMCommandGenerator
    user_input:
      max_characters: 420

Fine-tuning an LLM for Command Generation

Fine-tuning is a process where a pre-trained language model is further trained on a specific dataset to enhance its performance for a particular task or domain. This allows the model to adapt to specific needs and nuances that are not covered by general training data used during the initial pre-training phase.

Using the fine-tuning recipe you can fine-tune a base language model for the particular task of command generation with your data.

Why Fine-Tuning?

The key motivation behind incorporating fine-tuning into CALM is to address critical issues like high latency, low reliability, and reliance on proprietary LLMs like GPT-4:

Improved Latency and Performance: By fine-tuning and deploying a smaller, more efficient LLM on-premises or in a controlled cloud environment, we can significantly reduce the inference time, thus improving speed and responsiveness.
Enhanced Reliability: By self-deploying a fine-tuned LLM, we can ensure greater control over model's operation and performance, thus enhancing reliability even during peak usage times.
Cost Efficiency: Fine-tuning a small LLM in-house provides a cost-effective alternative without compromising on performance, thereby offering a better balance between cost and accuracy.
Strategic Benefits: Adopting the fine-tuning recipe empowers you to fully customize and control your language models, enhancing security and compliance.

Refer to the conceptual details of how the fine-tuning recipe works to understand what happens under the hood and once you are familiar with the details, refer to the user guide that walks you through every step of the recipe in practice.

Command reference

Rasa currently supports three Domain-Specific Languages (DSLs) for generating commands:

Version 3 – Optimized to improve the triggering accuracy of KnowledgeAnswerCommand and used by the SearchReadyLLMCommandGenerator.
Version 2 (default) – Optimized for small LLMs and used by the CompactLLMCommandGenerator.
Version 1 (deprecated) – Used by the deprecated components SingleStepLLMCommandGenerator and MultiStepLLMCommandGenerator.

Version 2 is the recommended choice for most use cases, as it is designed to work efficiently with modern compact LLMs.

DSL v3
DSL v2
DSL v1

start flow flow_name: Start a new flow.
cancel flow: Cancel the current flow.
disambiguate flows flow_name1 flow_name2 ...: Ask for clarification.
set slot slot_name slot_value: Set a slot to a given value.
search and reply: Reply to knowledge questions, including domain knowledge, FAQs, and all off-topic or social messages.
repeat message: Repeat the last bot message.

start flow flow_name: Start a new flow.
cancel flow: Cancel the current flow.
disambiguate flows flow_name1 flow_name2 ...: Ask for clarification.
set slot slot_name slot_value: Set a slot to a given value.
provide info: Reply a knowledge-based free-form answer.
chitchat: Respond with answers in a chitchat style, whether they are predefined or free-form.
human handoff: Hand off the conversation to a human.
repeat message: Repeat the last bot message.

StartFlow(flow_name): Start a new flow.
CancelFlow(): Cancel the current flow.
Clarify(flow_name_1, flow_name_2, ...): Ask for clarification.
SetSlot(slot_name, slot_value): Set a slot to a given value.
SkipQuestion(): Intercepting user messages intending to bypass the current collect step in the flow.
SearchAndReply(): Reply a knowledge-based free-form answer.
ChitChat(): Respond with answers in a chitchat style, whether they are predefined or free-form.
HumanHandoff(): Hand off the conversation to a human.
RepeatLastBotMessages(): Repeat the last bot message.

How an LLM-based Command Generator Works​

Interaction With Other Types of Command Generators​

Minimizing The Number of LLM Invocations​

Prioritization of Commands​

Types of LLM-based Command Generators​

SearchReadyLLMCommandGenerator​

Prompt Template​

Customization​

Customizing the Prompt Template​

CompactLLMCommandGenerator​

Prompt Template​

Customization​

Customizing the Prompt Template​

Retrieving Relevant Flows​

General Customization​

LLM configuration​

Customizing The Prompt​

Best Practices for Descriptions​

Customizing flow retrieval​

Customizing the maximum length of user input​

Fine-tuning an LLM for Command Generation​

Why Fine-Tuning?​

Command reference​

How an LLM-based Command Generator Works

Interaction With Other Types of Command Generators

Minimizing The Number of LLM Invocations

Prioritization of Commands

Types of LLM-based Command Generators

SearchReadyLLMCommandGenerator

Prompt Template

Customization

Customizing the Prompt Template

CompactLLMCommandGenerator

Prompt Template

Customization

Customizing the Prompt Template

Retrieving Relevant Flows

General Customization

LLM configuration

Customizing The Prompt

Best Practices for Descriptions

Customizing flow retrieval

Customizing the maximum length of user input

Fine-tuning an LLM for Command Generation

Why Fine-Tuning?

Command reference