Skip to main content

Version Migration Guide

This page contains information about changes between major versions and how you can migrate from one version to another.

Rasa Pro 3.13 to Rasa Pro 3.14

Dependencies

info

To avoid any conflicts we strongly recommend using a fresh environment when installing Rasa >=3.14.0.

The default pip package for rasa-pro now supports Python versions 3.12 and 3.13, and drops support for Python version 3.9.

The package will exclude the following dependency categories:

  1. nlu - All dependencies required to run NLU/coexistence bots, including: transformers, tensorflow (and related packages: tensorflow-text, tensorflow-hub, tensorflow-gcs-filesystem, tensorflow-metal, tf-keras), spacy, sentencepiece, skops, mitie, jieba, sklearn-crfsuite.

  2. channels - All dependencies required to connect to channel connectors, including: fbmessenger, twilio, webexteamssdk, mattermostwrapper, rocketchat_API, aiogram, slack-sdk, cvg-python-sdk. Note: The following channels are NOT included in the channels extra: browser_audio, studio_chat, socketIO, and rest (used by inspector for text and voice).

Optional dependency categories are available to install the relevant packages. Use pip install rasa-pro[nlu] if you have an agent with NLU components. Similarly use pip install rasa-pro[channels] if you have an agent making use of channel connectors.

Docker images will continue to have the same dependencies as previously, except for the additional packages mcp and a2a-sdk which now form part of the core dependencies.

Important: tensorflow and its related dependencies are only supported for "python_version < '3.12'", so components requiring TensorFlow are not available for Python ≥ 3.12. These components are: DIETClassifier, TEDPolicy, UnexpecTEDIntentPolicy, ResponseSelector, ConveRTFeaturizer, and LanguageModelFeaturizer.

Pattern Continue Interrupted

We modified the pattern_continue_interrupted to ask for confirmation before returning to an interrupted flow. Here is an example conversation comparing the old and new implementations:

user: I want to transfer money
bot: how much do you want to transfer?
user: wait what's my balance?
bot: you have 4200 in your account
bot: Would you like to go back to transferring money? # asks for confirmation before continuing

Rationale for this change:

  1. Improved UX: Immediately returning to interrupted flows often creates an unnatural conversational experience.
  2. Error Correction: Sometimes the command generator incorrectly identifies a cancellation + start flow as a digression. This change allows users to guide the assistant in correcting these mistakes.
  3. Agent Integration: Sub agents sometimes don't have a reliable way to signal completion. When these agents are wrapped in flows and a digression occurs, we need user input to determine if the agent's task is complete.

Here is the comparison between the old and the new pattern_continue_interrupted.

pattern_continue_interrupted:
description: Conversation repair flow for managing when users switch between different flows
name: pattern continue interrupted
steps:
- noop: true
next:
- if: context.multiple_flows_interrupted
then: collect_interrupted_flow_to_continue
- else: collect_continue_interrupted_flow_confirmation
- id: collect_interrupted_flow_to_continue
collect: interrupted_flow_to_continue
description: "Fill this slot with the name of the flow the user wants to continue. If the user does not want to continue any of the interrupted flows, fill this slot with 'none'."
next:
- if: slots.interrupted_flow_to_continue is not "none"
then:
- action: action_continue_interrupted_flow
next: END
- else:
- action: action_cancel_interrupted_flows
next: END
- id: collect_continue_interrupted_flow_confirmation
collect: continue_interrupted_flow_confirmation
description: "If the user wants to continue the interrupted flow, fill this slot with true. If the user does not want to continue the interrupted flow, fill this slot with false."
next:
- if: slots.continue_interrupted_flow_confirmation
then:
- action: action_continue_interrupted_flow
next: END
- else:
- action: action_cancel_interrupted_flows
next: END

Command Generator

Prompt Template

No Action Required: If you don't use sub agents, your prompt templates remain unchanged.

Migration Required: When sub agents are used, Rasa automatically switches to new default prompts that include agent support. If you have customized prompt templates for the CompactLLMCommandGenerator or SearchReadyLLMCommandGenerator, you must update your custom prompts to include the new agent-related commands and functionality.

Prompt Templates of the CompactLLMCommandGenerator

The prompt template for the gpt-4o-2024-11-20 model with agent support is as follows:

## Task Description
Your task is to analyze the current conversation context and generate a list of actions to start new business processes that we call flows, to extract slots, or respond to small talk and knowledge requests.

---

## Available Flows and Slots
Use the following structured data:
```json
{"flows":[{% for flow in available_flows %}{"name":"{{ flow.name }}","description":{{ flow.description | to_json_escaped_string }}{% if flow.agent_info %},"sub-agents":[{% for agent in flow.agent_info %}{"name":"{{ agent.name }}","description":{{ agent.description | to_json_escaped_string }}}{% if not loop.last %},{% endif %}{% endfor %}]{% endif %}{% if flow.slots %},"slots":[{% for slot in flow.slots %}{"name":"{{ slot.name }}"{% if slot.description %},"description":{{ slot.description | to_json_escaped_string }}{% endif %}{% if slot.allowed_values %},"allowed_values":{{ slot.allowed_values }}{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]}
```

---

## Available Actions:
* `start flow flow_name`: Starting a flow. For example, `start flow transfer_money` or `start flow list_contacts`.
* `set slot slot_name slot_value`: Slot setting. For example, `set slot transfer_money_recipient Freddy`. Can be used to correct and change previously set values. ONLY use slots that are explicitly defined in the flow's slot list.
* `cancel flow`: Cancelling the current flow.
* `disambiguate flows flow_name1 flow_name2 ... flow_name_n`: Disambiguate which flow should be started when user input is ambiguous by listing the potential flows as options. For example, `disambiguate flows list_contacts add_contact remove_contact ...` if the user just wrote "contacts".
* `provide info`: Responding to the user's questions by supplying relevant information, such as answering FAQs or explaining services.
* `offtopic reply`: Responding to casual or social user messages that are unrelated to any flows, engaging in friendly conversation and addressing off-topic remarks.
* `hand over`: Handing over to a human, in case the user seems frustrated or explicitly asks to speak to one.
* `repeat message`: Repeating the last bot message.{% if active_agent %}
* `continue agent`: Continue the currently active agent {{ active_agent.name }}. This has HIGHEST PRIORITY when an agent is active and the user is responding to agent questions.{% endif %}{% if completed_agents %}
* `restart agent agent_name`: Restart the agent with the given name, in case the user wants to change some answer to a previous question asked by the agent. For example, `restart agent car_research_agent` if the user changed his mind about the car he wants to buy. ONLY use agents that are listed in the `completed_agents` section.{% endif %}

---

## General Tips
* Do not fill slots with abstract values or placeholders.
* For categorical slots try to match the user message with allowed slot values. Use "other" if you cannot match it.
* Set the boolean slots based on the user response. Map positive responses to `True`, and negative to `False`.
* Extract text slot values exactly as provided by the user. Avoid assumptions, format changes, or partial extractions.
* ONLY use `set slot` with slots that are explicitly defined in the current flow's slot list. Do NOT create or assume slots that don't exist.
* Only use information provided by the user.
* Use clarification in ambiguous cases.
* Multiple flows can be started. If a user wants to digress into a second flow, you do not need to cancel the current flow.
* Do not cancel the flow unless the user explicitly requests it.
* Strictly adhere to the provided action format.
* ONLY use the exact actions listed above. Do NOT invent new actions like "respond <message>" or any other variations.
* Focus on the last message and take it one step at a time.
* Use the previous conversation steps only to aid understanding.{% if active_agent %}
* When an agent is active, ALWAYS prioritize `continue agent` over `provide info` or `offtopic reply` unless the user is clearly asking something unrelated to the agent's task.{% endif %}{% if completed_agents %}
* ONLY use `restart agent` with agents that are listed in the `completed_agents` section. Do NOT restart non-existent agents.{% endif %}{% if active_agent or completed_agents %}
* If you're unsure about agent names, refer to the structured data provided in the `Current State` section.{% endif %}

---

## Current State
{% if current_flow != None %}
Use the following structured data:
```json
{"active_flow":{"name":"{{ current_flow }}","current_step":{"requested_slot":"{{ current_slot }}","requested_slot_description":{{ current_slot_description | to_json_escaped_string }}},"slots":[{% for slot in flow_slots %}{"name":"{{ slot.name }}","value":"{{ slot.value }}","type":"{{ slot.type }}"{% if slot.description %},"description":{{ slot.description | to_json_escaped_string }}{% endif %}{% if slot.allowed_values %},"allowed_values":"{{ slot.allowed_values }}"{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]}{% if active_agent %},"active_agent":{"name":"{{ active_agent.name }}","description":{{ active_agent.description | to_json_escaped_string }}}{% endif %}{% if completed_agents %},"completed_agents":[{% for agent in completed_agents %}{"name":"{{ agent.name }}","description":{{ agent.description | to_json_escaped_string }}}{% if not loop.last %},{% endif %}{% endfor %}]{% endif %}}
```{% else %}
You are currently not inside any flow.{% endif %}

---

## Conversation History
{{ current_conversation }}

---

## Task
Create an action list with one action per line in response to the user's last message: """{{ user_message }}""".

Your action list:

Prompt Templates of the SearchReadyLLMCommandGenerator

The prompt template for the gpt-4o-2024-11-20 model with agent support is as follows:

## Task Description
Your task is to analyze the current conversation context and generate a list of actions to start new business processes that we call flows, to extract slots, or respond to off-topic and knowledge requests.

---

## Available Flows and Slots
Use the following structured data:
```json
{"flows":[{% for flow in available_flows %}{"name":"{{ flow.name }}","description":{{ flow.description | to_json_escaped_string }}{% if flow.agent_info %},"sub-agents":[{% for agent in flow.agent_info %}{"name":"{{ agent.name }}","description":{{ agent.description | to_json_escaped_string }}}{% if not loop.last %},{% endif %}{% endfor %}]{% endif %}{% if flow.slots %},"slots":[{% for slot in flow.slots %}{"name":"{{ slot.name }}"{% if slot.description %},"description":{{ slot.description | to_json_escaped_string }}{% endif %}{% if slot.allowed_values %},"allowed_values":{{ slot.allowed_values }}{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]}
```

---

## Available Actions:
* `start flow flow_name`: Start a flow. For example, `start flow transfer_money` or `start flow list_contacts`.
* `set slot slot_name slot_value`: Set a slot for the active flow. For example, `set slot transfer_money_recipient Freddy`. Can be used to correct and change previously set values. ONLY use slots that are explicitly defined in the flow's slot list.
* `disambiguate flows flow_name1 flow_name2 ... flow_name_n`: When a message could refer to multiple flows, list the possible flows as options to clarify. Example: `disambiguate flows list_contacts add_contact remove_contact`.
* `search and reply`: Provide a response from the knowledge base to address the user’s inquiry when no flows fit, including domain knowledge, FAQs, and all off-topic or social messages.
* `cancel flow`: Cancel the current flow if the user requests it.
* `repeat message`: Repeat the last bot message.{% if active_agent %}
* `continue agent`: Continue the currently active agent {{ active_agent.name }}. This has HIGHEST PRIORITY when an agent is active and the user is responding to agent questions.{% endif %}{% if completed_agents %}
* `restart agent agent_name`: Restart the agent with the given name, in case the user wants to change some answer to a previous question asked by the agent. For example, `restart agent car_research_agent` if the user changed his mind about the car he wants to buy. ONLY use agents that are listed in the `completed_agents` section.{% endif %}

---

## General Instructions
### Start Flow
* Only start a flow if the user's message is clear and fully addressed by that flow's description and purpose.
* Pay close attention to exact wording and scope in the flow description — do not assume or “stretch” the intended use of a flow.
### Set Slot
* Do not fill slots with abstract values or placeholders.
* For categorical slots try to match the user message with allowed slot values. Use "other" if you cannot match it.
* Set the boolean slots based on the user response. Map positive responses to `True`, and negative to `False`.
* Extract text slot values exactly as provided by the user. Avoid assumptions, format changes, or partial extractions.
* ONLY use `set slot` with slots that are explicitly defined in the current flow's slot list. Do NOT create or assume slots that don't exist.
### Disambiguate Flows
* Use `disambiguate flows` when the user's message matches multiple flows and you cannot decide which flow is most appropriate.
* If the user message is short and not precise enough to start a flow or `search and reply`, disambiguate.
* If a single flow is a strong/plausible fit, prefer starting that flow directly.
* If a user's message unambiguously and distinctly matches multiple flows, start all relevant flows at once (rather than disambiguating).
### Search and Reply
* Only start `search and reply` if the user intent is clear.
* Flow Priority: If you are unsure between starting a flow or `search and reply`, always prioritize starting a flow.
### Cancel Flow
* Do not cancel any flow unless the user explicitly requests it.
* Multiple flows can be started without cancelling the previous, if the user wants to pursue multiple processes.{% if active_agent or completed_agents %}
### Agents{% if active_agent %}
* When an agent is active, ALWAYS prioritize `continue agent` over `search and reply` unless the user is clearly asking something unrelated to the agent's task.{% endif %}{% if completed_agents %}
* ONLY use `restart agent` with agents that are listed in the `completed_agents` section. Do NOT restart non-existent agents.{% endif %}
* If you're unsure about agent names, refer to the structured data provided in the `Current State` section.
{% endif %}### General Tips
* Only use information provided by the user.
* Strictly adhere to the provided action format.
* ONLY use the exact actions listed above. Do NOT invent new actions like "respond <message>" or any other variations.
* Focus on the last message and take it one step at a time.
* Use the previous conversation steps only to aid understanding.

---

## Decision Rule Table
| Condition | Action |
|---------------------------------------------------------------|--------------------|{% if active_agent %}
| Agent is active and the user is responding to agent questions | continue agent |{% endif %}
| Flow perfectly matches user's message | start flow |
| Multiple flows are equally strong, relevant matches | disambiguate flows |
| User's message is unclear or imprecise | disambiguate flows |
| No flow fits at all, but knowledge base may help | search and reply |

---

## Current State
{% if current_flow != None %}Use the following structured data:
```json
{"active_flow":{"name":"{{ current_flow }}","current_step":{"requested_slot":"{{ current_slot }}","requested_slot_description":{{ current_slot_description | to_json_escaped_string }}},"slots":[{% for slot in flow_slots %}{"name":"{{ slot.name }}","value":"{{ slot.value }}","type":"{{ slot.type }}"{% if slot.description %},"description":{{ slot.description | to_json_escaped_string }}{% endif %}{% if slot.allowed_values %},"allowed_values":"{{ slot.allowed_values }}"{% endif %}}{% if not loop.last %},{% endif %}{% endfor %}]}{% if active_agent %},"active_agent":{"name":"{{ active_agent.name }}","description":{{ active_agent.description | to_json_escaped_string }}}{% endif %}{% if completed_agents %},"completed_agents":[{% for agent in completed_agents %}{"name":"{{ agent.name }}","description":{{ agent.description | to_json_escaped_string }}}{% if not loop.last %},{% endif %}{% endfor %}]{% endif %}}
```{% else %}
You are currently not inside any flow.{% endif %}

---

## Conversation History
{{ current_conversation }}

---

## Task
Create an action list with one action per line in response to the user's last message: """{{ user_message }}""".

Your action list:

Template Rendering

We updated the render_template method to include sub agent information.

If you customized the rendering method, add the new sub agent information to your implementation. The key changes to the function are highlighted below.

def render_template(
self,
message: Message,
tracker: DialogueStateTracker,
startable_flows: FlowsList,
all_flows: FlowsList,
) -> str:
"""Render the jinja template to create the prompt for the LLM.

Args:
message: The current message from the user.
tracker: The tracker containing the current state of the conversation.
startable_flows: The flows startable at this point in time by the user.
all_flows: all flows present in the assistant

Returns:
The rendered prompt template.
"""
# need to make this distinction here because current step of the
# top_calling_frame would be the call step, but we need the collect step from
# the called frame. If no call is active calling and called frame are the same.
top_calling_frame = top_flow_frame(tracker.stack)
top_called_frame = top_flow_frame(tracker.stack, ignore_call_frames=False)

top_flow = top_calling_frame.flow(all_flows) if top_calling_frame else None
current_step = top_called_frame.step(all_flows) if top_called_frame else None

flow_slots = self.prepare_current_flow_slots_for_template(
top_flow, current_step, tracker
)
current_slot, current_slot_description = self.prepare_current_slot_for_template(
current_step
)
current_slot_type = None
current_slot_allowed_values = None
if current_slot:
current_slot_type = (
slot.type_name
if (slot := tracker.slots.get(current_slot)) is not None
else None
)
current_slot_allowed_values = allowed_values_for_slot(
tracker.slots.get(current_slot)
)
has_agents = Configuration.get_instance().available_agents.has_agents()
current_conversation = tracker_as_readable_transcript(
tracker, highlight_agent_turns=has_agents
)
latest_user_message = sanitize_message_for_prompt(message.get(TEXT))
current_conversation += f"\nUSER: {latest_user_message}"

inputs: Dict[str, Any] = {
"available_flows": self.prepare_flows_for_template(
startable_flows,
tracker,
add_agent_info=has_agents,
),
"current_conversation": current_conversation,
"flow_slots": flow_slots,
"current_flow": top_flow.id if top_flow is not None else None,
"current_slot": current_slot,
"current_slot_description": current_slot_description,
"current_slot_type": current_slot_type,
"current_slot_allowed_values": current_slot_allowed_values,
"user_message": latest_user_message,
}
if has_agents:
inputs["active_agent"] = (
get_active_agent_info(tracker, top_flow.id) if top_flow else None
)
inputs["completed_agents"] = get_completed_agents_info(tracker)

return self.compile_template(self.prompt_template).render(**inputs)

LLM Clients

We updated the signature of the completion and acompletion functions in our LLMClient protocol to support LLM calls with tools:

def completion(
self, messages: Union[List[dict], List[str], str], **kwargs: Any
) -> LLMResponse:
async def acompletion(
self, messages: Union[List[dict], List[str], str], **kwargs: Any
) -> LLMResponse:

The **kwargs are passed through to the underlying LiteLLM completion functions.

Migration Required: If you have modified any existing LLM clients or implemented custom clients, update your completion and acompletion methods to match the new signature.

Here are a code snippets of the updated implementations in _BaseLiteLLMClient:

def completion(
self, messages: Union[List[dict], List[str], str], **kwargs: Any
) -> LLMResponse:
"""Synchronously generate completions for given list of messages.

Args:
messages: The message can be,
- a list of preformatted messages. Each message should be a dictionary
with the following keys:
- content: The message content.
- role: The role of the message (e.g. user or system).
- a list of messages. Each message is a string and will be formatted
as a user message.
- a single message as a string which will be formatted as user message.
**kwargs: Additional parameters to pass to the completion call.

Returns:
List of message completions.

Raises:
ProviderClientAPIException: If the API request fails.
"""
...
response = litellm.completion(
messages=formatted_messages, **{**arguments, **kwargs}
)
...

Add tool_calls to LLMResponse

We added a tool_calls field to the LLMResponse class to capture tool calls from LLM responses. The _format_response function was updated to extract tool calls from the LiteLLM response.

Migration Impact: If you have custom code that processes LLMResponse objects, you may need to handle the new tool_calls field.

@dataclass
class LLMResponse:
id: str
"""A unique identifier for the completion."""

choices: List[str]
"""The list of completion choices the model generated for the input prompt."""

created: int
"""The Unix timestamp (in seconds) of when the completion was created."""

model: Optional[str] = None
"""The model used for completion."""

usage: Optional[LLMUsage] = None
"""An optional details about the token usage for the API call."""

additional_info: Optional[Dict] = None
"""Optional dictionary for storing additional information related to the
completion that may not be covered by other fields."""

latency: Optional[float] = None
"""Optional field to store the latency of the LLM API call."""

tool_calls: Optional[List[LLMToolCall]] = None
"""The list of tool calls the model generated for the input prompt."""
class LLMToolCall(BaseModel):
"""A class representing a response from an LLM tool call."""

id: str
"""The ID of the tool call."""

tool_name: str
"""The name of the tool that was called."""

tool_args: Dict[str, Any]
"""The arguments passed to the tool call."""

type: str = "function"
"""The type of the tool call."""

Command

Migration Impact: The changes mentioned below improve conversation flow handling and agent state management. No action required unless you have custom command implementations.

StartFlow Command

We updated the run_command_on_tracker method to handle StartFlow commands when users are in pattern_continue_interrupted state, ensuring smooth conversation flow by cleaning up the pattern. We also added proper agent state management to handle agent interruptions.

Flow Resumption: Previously, StartFlow commands for flows already on the stack (but not active) were ignored. This behavior was updated to resume the flow instead.

    def run_command_on_tracker(
self,
tracker: DialogueStateTracker,
all_flows: FlowsList,
original_tracker: DialogueStateTracker,
) -> List[Event]:
"""Runs the command on the tracker.

Args:
tracker: The tracker to run the command on.
all_flows: All flows in the assistant.
original_tracker: The tracker before any command was executed.

Returns:
The events to apply to the tracker.
"""
stack = tracker.stack
original_stack = original_tracker.stack
applied_events: List[Event] = []

if self.flow not in all_flows.flow_ids:
structlogger.debug(
"start_flow_command.skip_command.start_invalid_flow_id", command=self
)
return []

original_user_frame = top_user_flow_frame(original_stack)
original_top_flow = (
original_user_frame.flow(all_flows) if original_user_frame else None
)

# if the original top flow is the same as the flow to start, the flow is
# already active, do nothing
if original_top_flow is not None and original_top_flow.id == self.flow:
# in case continue_interrupted is not active, skip the already active start
# flow command
if not is_continue_interrupted_flow_active(stack):
return []

# if the continue interrupted flow is active, and the command generator
# predicted a start flow command for the flow which is on top of the stack,
# we just need to remove the pattern_continue_interrupted frame(s) from the
# stack
stack, flow_completed_events = remove_pattern_continue_interrupted_frames(
stack
)
applied_events.extend(flow_completed_events)
return applied_events + tracker.create_stack_updated_events(stack)

# if the flow is already on the stack, resume it
if (
self.flow in user_flows_on_the_stack(stack)
and original_user_frame is not None
):
# if pattern_continue_interrupted is active, we need to remove it
# from the stack before resuming the flow
stack, flow_completed_events = remove_pattern_continue_interrupted_frames(
stack
)
applied_events.extend(flow_completed_events)
applied_events.extend(resume_flow(self.flow, tracker, stack))
# the current active flow is interrupted
applied_events.append(
FlowInterrupted(
original_user_frame.flow_id, original_user_frame.step_id
)
)
return applied_events

frame_type = FlowStackFrameType.REGULAR

# remove the pattern_continue_interrupted frames from the stack
# if it is currently active but the user digressed from the pattern
stack, flow_completed_events = remove_pattern_continue_interrupted_frames(stack)
applied_events.extend(flow_completed_events)

if original_top_flow:
# if the original top flow is not the same as the flow to start,
# interrupt the current active flow
frame_type = FlowStackFrameType.INTERRUPT

if original_user_frame is not None:
applied_events.append(
FlowInterrupted(
original_user_frame.flow_id, original_user_frame.step_id
)
)

# If there is an active agent frame, interrupt it
active_agent_stack_frame = stack.find_active_agent_frame()
if active_agent_stack_frame:
structlogger.debug(
"start_flow_command.interrupt_agent",
command=self,
agent_id=active_agent_stack_frame.agent_id,
frame_id=active_agent_stack_frame.frame_id,
flow_id=active_agent_stack_frame.flow_id,
)
active_agent_stack_frame.state = AgentState.INTERRUPTED
applied_events.append(
AgentInterrupted(
active_agent_stack_frame.agent_id,
active_agent_stack_frame.flow_id,
)
)

structlogger.debug("start_flow_command.start_flow", command=self)
stack.push(UserFlowStackFrame(flow_id=self.flow, frame_type=frame_type))
return applied_events + tracker.create_stack_updated_events(stack)

Cancel Command

We updated the run_command_on_tracker method to properly handle agent cancellation when flows are canceled, ensuring active agent stack frames are removed and agents are properly canceled.

def run_command_on_tracker(
self,
tracker: DialogueStateTracker,
all_flows: FlowsList,
original_tracker: DialogueStateTracker,
) -> List[Event]:
"""Runs the command on the tracker.

Args:
tracker: The tracker to run the command on.
all_flows: All flows in the assistant.
original_tracker: The tracker before any command was executed.

Returns:
The events to apply to the tracker.
"""
stack = tracker.stack
original_stack = original_tracker.stack

applied_events: List[Event] = []
user_frame = top_user_flow_frame(
original_stack, ignore_call_and_link_frames=False
)
current_flow = user_frame.flow(all_flows) if user_frame else None

if not current_flow:
structlogger.debug(
"cancel_command.skip_cancel_flow.no_active_flow", command=self
)
return []

if agent_frame := original_tracker.stack.find_active_agent_stack_frame_for_flow(
current_flow.id
):
structlogger.debug(
"cancel_command.remove_agent_stack_frame",
command=self,
frame=agent_frame,
)
remove_agent_stack_frame(stack, agent_frame.agent_id)
applied_events.append(
AgentCancelled(agent_id=agent_frame.agent_id, flow_id=current_flow.id)
)

# we pass in the original dialogue stack (before any of the currently
# predicted commands were applied) to make sure we don't cancel any
# frames that were added by the currently predicted commands.
canceled_frames = self.select_canceled_frames(original_stack)

stack.push(
CancelPatternFlowStackFrame(
canceled_name=current_flow.readable_name(
language=tracker.current_language
),
canceled_frames=canceled_frames,
)
)

if user_frame:
applied_events.append(FlowCancelled(user_frame.flow_id, user_frame.step_id))

return applied_events + tracker.create_stack_updated_events(stack)

Clarify Command

We updated the run_command_on_tracker method to properly signal agent interruption when clarification is needed.

def run_command_on_tracker(
self,
tracker: DialogueStateTracker,
all_flows: FlowsList,
original_tracker: DialogueStateTracker,
) -> List[Event]:
"""Runs the command on the tracker.

Args:
tracker: The tracker to run the command on.
all_flows: All flows in the assistant.
original_tracker: The tracker before any command was executed.

Returns:
The events to apply to the tracker.
"""
flows = [all_flows.flow_by_id(opt) for opt in self.options]
clean_options = [flow.id for flow in flows if flow is not None]
if len(clean_options) != len(self.options):
structlogger.debug(
"clarify_command.altered_command.dropped_clarification_options",
command=self,
original_options=self.options,
cleaned_options=clean_options,
)
if len(clean_options) == 0:
structlogger.debug(
"clarify_command.skip_command.empty_clarification", command=self
)
return []

stack = tracker.stack
relevant_flows = [all_flows.flow_by_id(opt) for opt in clean_options]

names = [
flow.readable_name(language=tracker.current_language)
for flow in relevant_flows
if flow is not None
]

applied_events: List[Event] = []

# if the top stack frame is an agent stack frame, we need to
# update the state to INTERRUPTED and add an AgentInterrupted event
if top_stack_frame := stack.top():
if isinstance(top_stack_frame, AgentStackFrame):
applied_events.append(
AgentInterrupted(
top_stack_frame.agent_id,
top_stack_frame.flow_id,
)
)
top_stack_frame.state = AgentState.INTERRUPTED

stack.push(ClarifyPatternFlowStackFrame(names=names))
return applied_events + tracker.create_stack_updated_events(stack)

ChitChat Command

We updated the run_command_on_tracker method to properly signal agent interruption when handling chitchat.

def run_command_on_tracker(
self,
tracker: DialogueStateTracker,
all_flows: FlowsList,
original_tracker: DialogueStateTracker,
) -> List[Event]:
"""Runs the command on the tracker.

Args:
tracker: The tracker to run the command on.
all_flows: All flows in the assistant.
original_tracker: The tracker before any command was executed.

Returns:
The events to apply to the tracker.
"""
stack = tracker.stack

applied_events: List[Event] = []

# if the top stack frame is an agent stack frame, we need to
# update the state to INTERRUPTED and add an AgentInterrupted event
if top_stack_frame := stack.top():
if isinstance(top_stack_frame, AgentStackFrame):
applied_events.append(
AgentInterrupted(
top_stack_frame.agent_id,
top_stack_frame.flow_id,
)
)
top_stack_frame.state = AgentState.INTERRUPTED

stack.push(ChitchatPatternFlowStackFrame())
return applied_events + tracker.create_stack_updated_events(stack)

KnowledgeAnswer Command

We updated the run_command_on_tracker method to properly signal agent interruption when handling knowledge requests.

def run_command_on_tracker(
self,
tracker: DialogueStateTracker,
all_flows: FlowsList,
original_tracker: DialogueStateTracker,
) -> List[Event]:
"""Runs the command on the tracker.

Args:
tracker: The tracker to run the command on.
all_flows: All flows in the assistant.
original_tracker: The tracker before any command was executed.

Returns:
The events to apply to the tracker.
"""
stack = tracker.stack

applied_events: List[Event] = []

# if the top stack frame is an agent stack frame, we need to
# update the state to INTERRUPTED and add an AgentInterrupted event
if top_stack_frame := stack.top():
if isinstance(top_stack_frame, AgentStackFrame):
applied_events.append(
AgentInterrupted(
top_stack_frame.agent_id,
top_stack_frame.flow_id,
)
)
top_stack_frame.state = AgentState.INTERRUPTED

stack.push(SearchPatternFlowStackFrame())
return applied_events + tracker.create_stack_updated_events(stack)

Tracing for Jaeger

We updated the port configuration for Jaeger tracing collection.

Migration Required: Update your Jaeger configuration to use the new port settings.

Updated Docker Command:

docker run --rm --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
-p 5778:5778 \
-p 9411:9411 \
cr.jaegertracing.io/jaegertracing/jaeger:2.10.0

Updated Configuration:

Update your endpoints.yml file with the new port configuration:

tracing:
type: jaeger
host: 0.0.0.0
port: 4317
service_name: rasa
sync_export: ~

The Jaeger UI is now accessible at http://localhost:16686/search.

Rasa Studio v1.12.x → v1.13.x

What's New

We've made important improvements to Rasa Studio's database migrations:

  • No more superuser required: In earlier versions, certain database migrations required a user with superuser privileges. This is no longer necessary. All migrations can now be completed using a standard database user.

Before You Upgrade

If you're upgrading from a version before v1.13.x, please follow the below steps.

Step-by-Step Upgrade Instructions

  1. Upgrade to v1.12.7 First

    This ensures that all necessary database migrations are applied before moving to the 1.13.x version

  2. Mark Migrations as Complete

    After upgrading to v1.12.7, run the following SQL command on your Studio database:

    insert into public._prisma_migrations (
    id,
    checksum,
    finished_at,
    migration_name,
    started_at,
    applied_steps_count)
    values (
    '08eb97ec-85fa-4578-921e-091d50c4a816',
    'c0993f05c8c4021b096d2d8c78d7f3977e81388ae36e860387eddb2c3553a65b',
    now(),
    '000000000000_squashed_migrations',
    now(),
    1);

    This tells the system the earlier migrations are already applied, so they won't run again.

  3. Upgrade to v1.13.x or Later

    After completing the steps above, you're ready to upgrade to the latest version of Rasa Studio.

Rasa Pro 3.12 to Rasa Pro 3.13

LLM Judge Model Change in E2E Testing

Starting with Rasa Pro v3.13.x, the default model for the LLM Judge in E2E tests has changed from gpt-4o-mini to gpt-4.1-mini, see Generative Response LLM Judge Configuration. The new model may produce lower scores for the generative_response_is_relevant and generative_response_is_grounded assertions, which can cause previously passing responses to be incorrectly marked as failures (false negatives).

Action Required:

  • Lower the thresholds for generative_response_is_relevant and generative_response_is_grounded in your E2E test configuration to reduce the risk of false negatives.
  • Alternatively, if you prefer not to lower the thresholds, configure the LLM Judge to use a more performant model (note: this may increase costs). For details on configuring the LLM Judge, see the E2E testing documentation.

Rasa Pro 3.11 to Rasa Pro 3.12

Custom LLM-based Command Generators

In order to improve slot filling in CALM and allow all types of command generators to issue commands at every conversation turn, we have made the following changes which you should consider to benefit from the new CALM slot filling improvements:

  • added a new method _check_commands_overlap to the base class CommandGenerator. This method checks if the commands issued by the current command generator overlap with the commands issued by other command generators. This method returns the final deduplicated commands. This method is called by the predict_commands method of the CommandGenerator children classes.
  • added two new methods _check_start_flow_command_overlap and _filter_slot_commands to the base class CommandGenerator that will raise NotImplementedError if not implemented by the child class. These methods are already implemented by the LLMBasedCommandGenerator and NLUCommandAdapter classes to uphold the prioritization system of the commands.
  • added a new method _get_prior_commands to the base class CommandGenerator. This method returns a list of commands that have been issued by other command generators prior to the one currently running. This method is called by the predict_commands method of any command generators that inherit from the CommandGenerator class. This prior commands can be either returned in case of an empty tracker or flows, or included to the newly issued commands. For example:
prior_commands = self._get_prior_commands(tracker)
if tracker is None or flows.is_empty():
return prior_commands

# custom command generation logic block

return self._check_commands_overlap(prior_commands, commands)
  • added a new method _should_skip_llm_call to the LLMBasedCommandGenerator. This method returns True only if minimize_num_calls is set to True and either prior commands contain a StartFlow command or a SetSlot command for the slot that is requested by an active collect flow step. This method is called by the predict_commands method of the LLMBasedCommandGenerator children classes. If the method returns True, the LLM call is skipped and the method returns the prior commands.
  • moved the _check_commands_against_slot_mappings static method from the CommandGenerator to the LLMBasedCommandGenerator class. This method is used to check if the issued LLM commands are relevant to the slot mappings. The method is called by the predict_commands method of the LLMBasedCommandGenerator children classes.

Migration from SingleStepLLMCommandGenerators to the CompactLLMCommandGenerators

It is recommended to use the new CompactLLMCommandGenerator with optimized prompts for the gpt-4o-2024-11-20 and claude-sonnet-3.5-20240620 models. Using the CompactLLMCommandGenerator can significantly reduce costs - approximately 10 times, according to our tests.

If you've built a custom command generator that extends SingleStepLLMCommandGenerator, we recommend migrating to the new command generator by inheriting the class from CompactLLMCommandGenerator.

# Old class definition:
from rasa.dialogue_understanding.generators import SingleStepLLMCommandGenerator

class MyCommandGenerator(SingleStepLLMCommandGenerator):
...

# New class definition:
from rasa.dialogue_understanding.generators import CompactLLMCommandGenerator

class MyCommandGenerator(CompactLLMCommandGenerator):
...

Migration from SingleStepLLMCommandGenerators to the CompactLLMCommandGenerators with the custom commands

If yo've built a custom command generator that extends SingleStepLLMCommandGenerator and you've defined new commands or overridden Rasa's default commands, you should:

  • Update the parse_commands method to reflect the changes in the command parsing logic.
  • Update the custom command classes so they are compatible with the latest command interface. For details on updating and implementing custom command classes, please refer to How to customize existing commands section.

In the new implementation, command parsing has been delegated to a dedicated parsing utility method parse_commands which can be imported from rasa.dialogue_understanding.generator.command_parser This method handles the parsing of the predicted LLM output into commands more effectively and flexibly, especially when using customized or newly introduced command types.

Here is the new recommended pattern for your command generator's parse_commands method:


# Import the utility method under a different name to prevent confusion with the
# command generator's `parse_commands`
from rasa.dialogue_understanding.generator.command_parser import (
parse_commands
as parse_commands_using_command_parsers,
)

class CustomCommandGenerator(CompactLLMCommandGenerator):
"""Custom implementation of the LLM command generator."""

...

@classmethod
def parse_commands(
cls, actions: Optional[str], tracker: DialogueStateTracker, flows: FlowsList
) -> List[Command]:
"""Parse the actions returned by the LLM into intents and entities as commands.

Args:
actions: The actions returned by the LLM.
tracker: The tracker containing the current state of the conversation.
flows: The current list of active flows.

Returns:
The parsed commands.
"""
commands = parse_commands_using_command_parsers(
actions,
flows,
# Register any custom command classes you have created here
additional_commands=[CustomCommandClass1, CustomCommandClass2, ...],
# If your custom command classes replaces or extends default commands,
# specify the defaults commands for removal here
default_commands_to_remove=[HumandHandoffCommand, ...]
)

if not commands:
structlogger.warning(
f"{cls.__name__}.parse_commands",
message="No commands were parsed from the LLM actions.",
actions=actions,
)

return commands

Migration of the custom prompt from SingleStepLLMCommandGenerators to the CompactLLMCommandGenerators

If you've customized the default prompt template previously used with the SingleStepLLMCommandGenerator and are now migrating to the CompactLLMCommandGenerator, you must update this template to use the new prompt commands syntax. This updated command syntax is specifically optimized for the capabilities of the new CompactLLMCommandGenerator.

For more details on the new prompt, refer to the documentation here

Update to utter_corrected_previous_input default utterance

The text of the default utter_corrected_previous_input utterance has been updated to use a new correction frame context property context.new_slot_values instead of context.corrected_slots.values. The new utterance is:

"Ok, I am updating {{ context.corrected_slots.keys()|join(', ') }} to {{ context.new_slot_values | join(', ') }} respectively."

LLM Judge Config Format Change in E2E Testing

The custom configuration of the LLM Judge used by E2E testing with assertions has been updated to use the llm_judge key which follows the same structure as other generative components in Rasa. This can either use model groups configuration or the individual model configuration option. The llm_judge key can be used in the conftest.yml file as shown below:

llm_judge:
llm:
provider: "openai"
model: "gpt-4-0613"
embeddings:
provider: "openai"
model: "text-embedding-ada-002"

action property in custom slot mapping replaced with run_action_every_turn

With the deprecation of the custom slot mapping in favor of the new controlled mapping type, the action property associated with the custom slot mapping has been replaced with the run_action_every_turn property. For this reason, if you prefer not to run these custom actions at every turn, it is recommended you remove the action property from your slot mappings.

Rasa Pro 3.9 to Rasa Pro 3.10

LLM/Embedding Configuration

The LLM and embedding configurations have been updated to use the provider key instead of the type key. These changes apply to all providers, with some examples provided for reference.

Cohere

llm:
provider: "cohere" # instead of "type: cohere"
model: "command-r"

Vertex AI

llm:
provider: "vertex_ai" # instead of "type: vertexai"
model: "gemini-pro"

Hugging Face Hub

llm:
provider: "huggingface" # instead of "type: huggingface_hub"
model: "HuggingFaceH4/zephyr-7b-beta" # instead of "repo_id: HuggingFaceH4/zephyr-7b-beta"

llama.cpp

The support for loading models directly have been removed. You need to deploy the model to a server and use the server URL to load the model. For instance a llama.cpp server can be run using the following command, ./llama-server -m your_model.gguf --port 8080.

For more information on llama.cpp server, refer to the llama.cpp documentation The assistant can be configured as:

llm:
provider: "self-hosted" # instead of "type: llamacpp"
api_base: "http://localhost:8000/v1" # instead of "model_path: "/path/to/model.bin""
model: "ggml-org/Meta-Llama-3.1-8B-Instruct-Q4_0-GGUF"

vLLM

The model can be deployed and served through vLLM==0.6.0. For instance a vLLM server can be run using the following command, vllm serve your_model

For more information on vLLM server, refer to the vLLM documentation The assistant can be configured as:

llm:
provider: "self-hosted" # instead of "type: vllm_openai"
api_base: "http://localhost:8000/v1"
model: "NousResearch/Meta-Llama-3-8B-Instruct" # the name of the model you have deployed
note

CALM exclusively utilizes the chat completions endpoint of the model server, so it's essential that the model's tokenizer includes a chat template. Models lacking a chat template will not be compatible with CALM anymore.

Backward compatibility has been maintained for OpenAI and Azure configurations. For all other providers, ensure the use of the provider key and review the configuration against the documentation.

Disabling the cache

For Rasa Pro versions <= 3.9.x, the correct way to disable the cache was:

llm:
model: ...
cache: false

Rasa Pro 3.10.0 onwards, this has changed since we rely on LiteLLM to manage caching. To avoid errors, change your configuration to -

llm:
model: ...
cache:
no-cache: true

Custom Components using an LLM

As of Rasa Pro 3.10, the backend for sending LLM and Embedding API requests has undergone a significant change. The previous LangChain version 0.0.329 has been replaced with LiteLLM.

This shift can potentially break custom implementations of components that configure and send API requests to chat completion and embedding endpoints. Specifically, the following components are impacted:

If your project contains custom components based on any of the affected components listed above, you will need to verify and possibly refactor your code to ensure compatibility with LiteLLM.

Changes to llm_factory

The llm_factory is used across all components that configure and send API requests to an LLM. Previously, the llm_factory relied on LangChain's mapping to instantiate LangChain clients.

Rasa Pro 3.10 onwards, the llm_factory returns clients that conform to the new LLMClient protocol. This impacts any custom component that was previously relying on LangChain types.

If you have overridden components, such as a command generator, you will need to update your code to handle the new return type of LLMClient. This includes adjusting method calls and ensuring compatibility with the new protocol.

The following method calls will need to be adjusted if you have overridden them:

  • SingleStepLLMCommandGenerator.invoke_llm
  • MultiStepLLMCommandGenerator.invoke_llm
  • ContextualResponseRephraser.rephrase
  • EnterpriseSearchPolicy.predict_action_probabilities
  • IntentlessPolicy.generate_answer
  • LLMBasedRouter.predict_commands

Here's an example of how to update your code:


from rasa.shared.utils.llm import llm_factory
from rasa.shared.providers.llm.llm_client import LLMClient
from rasa.shared.providers.llm.llm_response import LLMResponse

# get the llm client via factory
llm: LLMClient = llm_factory(config, default_config)

# get the llm response synchronously
sync_response: LLMResponse = llm.completion(prompt) # or llm.completion([prompt_1, prompt_2,..., prompt_n])
sync_completion: str = sync_response.choices[0]

# get the llm response asynchronously
async_response: LLMResponse = await llm.acompletion(prompt) # or llm.acompletion([prompt_1, prompt_2,..., prompt_n])
async_completion: str = async_response.choices[0]

Changes to embedder_factory

The embedder_factory is used across all components that configure and send API requests to an embedding model. Previously, the embedder_factory returned LangChain's embedding clients of Embeddings type.

Rasa Pro 3.10 onwards, the embedder_factory returns clients that conform to the new EmbeddingClient protocol. This change is part of the move to LiteLLM, and it impacts any custom components that were previously relying on LangChain types.

If you have overridden components that rely on instantiating clients with embedder_factory you will need to update your code to handle the new return type of EmbeddingClient. This includes adjusting method calls and ensuring compatibility with the new protocol.

The following method calls will need to be adjusted if you have overridden them:

  • FlowRetrieval.load
  • FlowRetrieval.populate
  • EnterpriseSearchPolicy.load
  • EnterpriseSearchPolicy.train
  • IntentlessPolicy.load
  • Or if you have overridden the IntentlessPolicy.embedder attribute.

Here's an example of how to update your code:


from rasa.shared.utils.llm import embedder_factory
from rasa.shared.providers.embedding.embedding_client import EmbeddingClient
from rasa.shared.providers.embedding.embedding_response import EmbeddingResponse

# get the embedding client via factory
embedder: EmbeddingClient = embedder_factory(config, default_config)

# get the embedding response synchronously
sync_response: EmbeddingResponse = embedder.embed([doc_1, doc_2])
vectors: List[List[float]] = sync_response.data

# get the embedding response asynchronously
async_response: EmbeddingResponse = await embedder.aembed([doc_1, doc_2])
vectors: List[List[float]] = async_response.data

Changes to invoke_llm

The previous implementation of invoke_llm method in SingleStepLLMCommandGenerator, MultiStepLLMCommandGenerator, and the deprecated LLMCommandGenerator used llm_factory to instantiate LangChain clients. Since the factory now returns clients that conform to the new LLMClient protocol, any custom overrides of the invoke_llm method will need to be updated to accommodate the new return type.

Below you can find the invoke_llm method from Rasa Pro 3.9 and its updated version in Rasa Pro 3.10:


async def invoke_llm(self, prompt: Text) -> Optional[Text]:
"""Use LLM to generate a response.

Args:
prompt: The prompt to send to the LLM.

Returns:
The generated text.

Raises:
ProviderClientAPIException if an error during API call.
"""
llm = llm_factory(self.config.get(LLM_CONFIG_KEY), DEFAULT_LLM_CONFIG)
try:
llm_response = await llm.acompletion(prompt)
return llm_response.choices[0]
except Exception as e:
structlogger.error("llm_based_command_generator.llm.error", error=e)
raise ProviderClientAPIException(
message="LLM call exception", original_exception=e
)

Changes to SingleStepLLMCommandGenerator.predict_commands

For SingleStepLLMCommandGenerator, the predict_commands method now includes a call to self._update_message_parse_data_for_fine_tuning(message, commands, flow_prompt). This function is essential for enabling the fine-tuning recipe.

If you have overridden the predict_commands method, you need to manually add this call to ensure proper functionality:


async def predict_commands(
self,
message: Message,
flows: FlowsList,
tracker: Optional[DialogueStateTracker] = None,
**kwargs: Any,
) -> List[Command]:

...
action_list = await self.invoke_llm(flow_prompt)
commands = self.parse_commands(action_list, tracker, flows)

self._update_message_parse_data_for_fine_tuning(message, commands, flow_prompt)

return commands

Changes to the default configuration dictionary

The default configurations for the following components have been updated:

If you have custom implementations based on the default configurations for any of these components, ensure that your configuration dictionary aligns with the updates shown in the tables below, as the defaults have changed.

Default LLM configuration keys have been updated from:

DEFAULT_LLM_CONFIG = {
"_type": "openai",
"model_name": ...,
"request_timeout": ...,
"temperature": ...,
"max_tokens": ...,
}

to:

DEFAULT_LLM_CONFIG = {
"provider": "openai",
"model": ...,
"temperature": ...,
"max_tokens": ...,
"timeout": ...,
}

Similarly, default embedding configuration keys have been updated from:

DEFAULT_EMBEDDINGS_CONFIG = {
"_type": "openai",
"model": ...,
}

to:

DEFAULT_EMBEDDINGS_CONFIG = {
"provider": "openai",
"model": ...,
}

Be sure to update your custom configurations to reflect these changes in order to ensure continued functionality.

Dropped support for Python 3.8

Dropped support for Python 3.8 ahead of Python 3.8 End of Life in October 2024.

In Rasa Pro versions 3.10.0, 3.9.11 and 3.8.13, we needed to pin the TensorFlow library version to 2.13.0rc1 in order to remove critical vulnerabilities; this resulted in poor user experience when installing these versions of Rasa Pro with uv pip. Removing support for Python 3.8 will make it possible to upgrade to a stabler version of TensorFlow.

Rasa Pro 3.8 to Rasa Pro 3.9

LLMCommandGenerator

Starting from Rasa Pro 3.9 the former LLMCommandGenerator is replaced by SingleStepLLMCommandGenerator. The LLMCommandGenerator is now deprecated and will be removed in version 4.0.0.

The SingleStepLLMCommandGenerator differs from the LLMCommandGenerator in how it handles failures of the invoke_llm method. Specifically, if the invoke_llm method call fails in SingleStepLLMCommandGenerator, it raises a ProviderClientAPIException. In contrast, the LLMCommandGenerator simply returns None when the method call fails.

Slot Mappings

In case you had been using custom slot mapping type for slots set with the prediction of the LLM-based command generator, you need to update your assistant's slot configuration to use the new from_llm slot mapping type. Note that even if you have written custom slot validation actions (following the validate_<slot_name> convention) for slots set by the LLM-based command generator, you need to update your assistant's slot configuration to use the new from_llm slot mapping type.

For slots that are set only via a custom action e.g. slots set by external sources only, you must add the action name to the slot mapping:

slots:
slot_name:
type: text
mappings:
- type: custom
action: custom_action_name

Rasa Pro 3.8.0 to Rasa Pro 3.8.1

Poetry Installation

Starting from Rasa Pro 3.8.1 in the 3.8.x minor series, we have upgraded the version Poetry for managing dependencies in the Rasa Pro Python package to 1.8.2. To install the latest micro versions of Rasa Pro in your project, you must first upgrade Poetry to version 1.8.2:

poetry self update 1.8.2

Rasa Pro 3.7 to 3.8

info

Starting from 3.8.0, Rasa and Rasa Plus have been merged into a single artifact, named Rasa Pro.

Installation

Following the merge we renamed the resulting python package and Docker image to rasa-pro.

Python package

Rasa Pro python package, for 3.8.0 and onward, is located at:

https://europe-west3-python.pkg.dev/rasa-releases/rasa-pro-python

Name of the package is rasa-pro.

Example of how to install the package:

pip install  --extra-index-url=https://europe-west3-python.pkg.dev/rasa-releases/rasa-pro-python/simple rasa-pro==3.8.0

While python package name was changed, the import process remains the same:

import rasa.core

from rasa import train

For more information on how to install Rasa Pro, please refer to the Python installation guide.

Helm Chart / Docker Image

Rasa Pro docker image, for 3.8.0 and onward, is located at:

europe-west3-docker.pkg.dev/rasa-releases/rasa-pro/rasa-pro

Example how to pull the image:

docker pull europe-west3-docker.pkg.dev/rasa-releases/rasa-pro/rasa-pro:3.8.0

For more information on how to install Rasa Pro Docker image, please refer to the Docker installation guide.

Component Yaml Configuration Changes

Follow the below instructions to update the configuration of Rasa Pro components in the 3.8 version:

lock_store:
type: concurrent_redis
nlg:
type: rephrase
  • Audiocodes and Vier CVG channels can be specified in credentials.yml using directly their channel name:
audiocodes:
token: "sample_token"

vier_cvg:
...
    policies:
- name: EnterpriseSearchPolicy
- name: IntentlessPolicy

Changes to default behaviour

info

With Rasa Pro 3.8, we introduced a couple of changes that rectifies the default behaviour of certain components. We believe these changes align better with the principles of CALM. If you are migrating an assistant built with Rasa Pro 3.7, please ensure you have checked if these changes affect your assistant.

Prompt Rendering

Rasa Pro 3.8 introduces a new feature flow-retrieval which ensures that only the flows that are relevant to the conversation context are included in the prompt sent to the LLM in the LLMCommandGenerator. This helps the assistant scale to a higher number of flows and also reduces the LLM costs.

This feature is enabled by default and we recommend to use it if the assistant has more than 40 flows. By default, the feature uses embedding models from OpenAI, but if you are using a different provider (for e.g. Azure), please ensure -

  1. An embedding model is configured with the provider.
  2. LLMCommandGenerator has been configured correctly to connect to the embedding provider. For example, see the section on configuration required to connect to Azure OpenAI service

If you wish to disable the feature you can configure the LLMCommandGenerator as:

config.yml
pipeline:
- name: SingleStepLLMCommandGenerator
...
flow_retrieval:
active: false
...

Processing Chitchat

The default behaviour in Rasa Pro 3.7 to handle chitchat utterances was to rely on free form generative responses. This can lead to the assistant sending unwanted responses or responding to out of scope user utterances. The new default behaviour in Rasa Pro 3.8 is to rely on IntentlessPolicy to respond to chitchat utterances using pre-defined responses only.

If you were relying on free form generative responses to handle chitchat in Rasa Pro 3.7, you will now see a warning message when you train the same assistant with Rasa Pro 3.8 - " pattern_chitchat has an action step with action_trigger_chitchat, but IntentlessPolicy is not configured". This appears because the default definition of pattern_chitchat has been modified in Rasa Pro 3.8 to:

pattern_chitchat:
description: handle interactions with the user that are not task-oriented
name: pattern chitchat
steps:
- action: action_trigger_chitchat

For the assistant to be able to handle chitchat utterances, you have two options:

  1. If you are happy with free-form generative responses for such user utterances, then you can override pattern_chitchat to:

    pattern_chitchat:
    description: handle interactions with the user that are not task-oriented
    name: pattern chitchat
    steps:
    - action: utter_free_chitchat_response
  2. If you want to switch to using pre-defined responses, you should first add IntentlessPolicy to the policies section of the config -

    policies:
    - name: IntentlessPolicy

Next, you should add response templates for the pre-defined responses you want the assistant to consider when responding to a chitchat user utterance.

Handling of categorical slots

Rasa Pro versions <= 3.7.8 used to store the value of a categorical slot in the same casing as it was either specified in the user message or predicted by the LLM in a SetSlot command. This wasn't necessarily same as the casing used in the corresponding possible value defined for that slot in the domain. For e.g, if the categorical slot was defined to have [A, B, C] as the possible values and the prediction was to set it to a then the slot would be set to a. This lead to problems downstream when that slot had to be used in other primitives i.e. flows or custom action.

Rasa Pro 3.7.9 fixes this by always storing the slot value in the same casing as defined in the domain. So, in the above example, the slot would now be stored as A instead of a. This ensures that the user is writing business logic for slot comparisons, for e.g. if conditions in flows, using the same casing as defined by them in the domain.

If you are migrating from Rasa pro versions <= 3.7.8, please double check your flows and custom actions to make sure none of them break because of this change.

Update default signature of LLM calls

In Rasa Pro >= 3.8 we switched from doing synchronous LLM calls to asynchronous calls. We updated all components that use an LLM, e.g.

  • LLMCommandGenerator
  • ContextualResponseRephraser
  • EnterpriseSearchPolicy
  • IntentlessPolicy

This can potentially break assistants migrating to 3.8 that have sub-classed one of these components in their own custom components.

For example, the method predict_commands in the LLMCommandGenerator is now async and needs to await the methods _generate_action_list_using_llm and flow_retrieval.filter_flows as these methods are also async. For more information on asyncio please check their documentation.

Dependency Upgrades

We've updated our core dependencies to enhance functionality and performance across our platform.

Spacy 3.7.x

Upgraded from >=3.6 to >=3.7.

We have transitioned to using Spacy version 3.7.x to benefit from the latest enhancements in natural language processing. If you're using any spacy models with your assistant, please update them to Spacy 3.7.x compatible models.

Pydantic 2.x

Upgraded from >=1.10.9,<1.10.10 to ^2.0.

Along with the Spacy upgrade, we have moved to Pydantic version 2.x, which necessitates updates to Pydantic models. For assistance with updating your models, please refer to the Pydantic Migration Guide. This ensures compatibility with the latest improvements in data validation and settings management.

Rasa Pro 3.7.9 to Rasa Pro 3.7.10

Poetry Installation

Starting from Rasa Pro 3.7.10 in the 3.7.x minor series, we have upgraded the version Poetry for managing dependencies in the Rasa Pro Python package to 1.8.2. To install Rasa Pro in your project, you must first upgrade Poetry to version 1.8.2:

poetry self update 1.8.2

Rasa Pro 3.7.8 to Rasa Pro 3.7.9

Changes to default behaviour

Handling of categorical slots

Rasa Pro versions <= 3.7.8 used to store the value of a categorical slot in the same casing as it was either specified in the user message or predicted by the LLM in a SetSlot command. This wasn't necessarily same as the casing used in the corresponding possible value defined for that slot in the domain. For e.g, if the categorical slot was defined to have [A, B, C] as the possible values and the prediction was to set it to a then the slot would be set to a. This lead to problems downstream when that slot had to be used in other primitives i.e. flows or custom action.

Rasa Pro 3.7.9 fixes this by always storing the slot value in the same casing as defined in the domain. So, in the above example, the slot would now be stored as A instead of a. This ensures that the user is writing business logic for slot comparisons, for e.g. if conditions in flows, using the same casing as defined by them in the domain.

If you are migrating from Rasa pro versions <= 3.7.8, please double check your flows and custom actions to make sure none of them break because of this change.

Rasa 3.6 to Rasa Pro 3.7

Installation

info

Starting from Rasa 3.7.0, Rasa has moved to a new package registry and Docker registry. You will need to update your package registry to install Rasa 3.7.0 and later versions. If you are a Rasa customer, please reach out to your Rasa account manager or support obtain a license.

Python package

Rasa python package for 3.7.0 has been moved to python package registry.

https://europe-west3-python.pkg.dev/rasa-releases/rasa-plus-py

Name of the package is rasa.

Example of how to install the package:

pip install  --extra-index-url=https://europe-west3-python.pkg.dev/rasa-releases/rasa-plus-py/simple rasa==3.7.0

For more information on how to install Rasa Pro, please refer to the Python installation guide.

Helm Chart / Docker Image

Rasa docker image for 3.7.0 is located at:

europe-west3-docker.pkg.dev/rasa-releases/rasa-docker/rasa

Example how to pull the image:

docker pull europe-west3-docker.pkg.dev/rasa-releases/rasa-docker/rasa:3.7.0

For more information on how to install Rasa Pro Docker image, please refer to the Docker installation guide.

Migrating from older versions

For migrating from Rasa Open Source versions, please refer to the migration guide.