Version Migration Guide
This page contains information about changes between major versions and how you can migrate from one version to another.
Rasa Studio v1.12.x → v1.13.x
What's New
We've made important improvements to Rasa Studio's database migrations:
- No more
superuser
required: In earlier versions, certain database migrations required a user with superuser privileges. This is no longer necessary. All migrations can now be completed using a standard database user.
Before You Upgrade
If you're upgrading from a version before v1.13.x
, please follow the below steps.
Step-by-Step Upgrade Instructions
-
Upgrade to
v1.12.7
FirstThis ensures that all necessary database migrations are applied before moving to the
1.13.x
version -
Mark Migrations as Complete
After upgrading to
v1.12.7
, run the following SQL command on your Studio database:insert into public._prisma_migrations (
id,
checksum,
finished_at,
migration_name,
started_at,
applied_steps_count)
values (
'08eb97ec-85fa-4578-921e-091d50c4a816',
'c0993f05c8c4021b096d2d8c78d7f3977e81388ae36e860387eddb2c3553a65b',
now(),
'000000000000_squashed_migrations',
now(),
1);This tells the system the earlier migrations are already applied, so they won't run again.
-
Upgrade to
v1.13.x
or LaterAfter completing the steps above, you're ready to upgrade to the latest version of Rasa Studio.
Rasa Pro 3.12 to Rasa Pro 3.13
LLM Judge Model Change in E2E Testing
Starting with Rasa Pro v3.13.x, the default model for the LLM Judge in E2E tests has changed from gpt-4o-mini
to gpt-4.1-mini
, see Generative Response LLM Judge Configuration.
The new model may produce lower scores for the generative_response_is_relevant
and generative_response_is_grounded
assertions, which can cause previously passing responses to be incorrectly marked as failures (false negatives).
Action Required:
- Lower the thresholds for
generative_response_is_relevant
andgenerative_response_is_grounded
in your E2E test configuration to reduce the risk of false negatives. - Alternatively, if you prefer not to lower the thresholds, configure the LLM Judge to use a more performant model (note: this may increase costs). For details on configuring the LLM Judge, see the E2E testing documentation.
Rasa Pro 3.11 to Rasa Pro 3.12
Custom LLM-based Command Generators
In order to improve slot filling in CALM and allow all types of command generators to issue commands at every conversation turn, we have made the following changes which you should consider to benefit from the new CALM slot filling improvements:
- added a new method
_check_commands_overlap
to the base classCommandGenerator
. This method checks if the commands issued by the current command generator overlap with the commands issued by other command generators. This method returns the final deduplicated commands. This method is called by thepredict_commands
method of theCommandGenerator
children classes. - added two new methods
_check_start_flow_command_overlap
and_filter_slot_commands
to the base classCommandGenerator
that will raiseNotImplementedError
if not implemented by the child class. These methods are already implemented by theLLMBasedCommandGenerator
andNLUCommandAdapter
classes to uphold the prioritization system of the commands. - added a new method
_get_prior_commands
to the base classCommandGenerator
. This method returns a list of commands that have been issued by other command generators prior to the one currently running. This method is called by thepredict_commands
method of any command generators that inherit from theCommandGenerator
class. This prior commands can be either returned in case of an empty tracker or flows, or included to the newly issued commands. For example:
prior_commands = self._get_prior_commands(tracker)
if tracker is None or flows.is_empty():
return prior_commands
# custom command generation logic block
return self._check_commands_overlap(prior_commands, commands)
- added a new method
_should_skip_llm_call
to theLLMBasedCommandGenerator
. This method returnsTrue
only ifminimize_num_calls
is set to True and either prior commands contain aStartFlow
command or aSetSlot
command for the slot that is requested by an activecollect
flow step. This method is called by thepredict_commands
method of theLLMBasedCommandGenerator
children classes. If the method returnsTrue
, the LLM call is skipped and the method returns the prior commands. - moved the
_check_commands_against_slot_mappings
static method from theCommandGenerator
to theLLMBasedCommandGenerator
class. This method is used to check if the issued LLM commands are relevant to the slot mappings. The method is called by thepredict_commands
method of theLLMBasedCommandGenerator
children classes.
Migration from SingleStepLLMCommandGenerator
s to the CompactLLMCommandGenerator
s
It is recommended to use the new CompactLLMCommandGenerator
with optimized prompts for the gpt-4o-2024-11-20
and claude-sonnet-3.5-20240620
models. Using the CompactLLMCommandGenerator
can significantly
reduce costs - approximately 10 times, according to our tests.
If you've built a custom command generator that extends SingleStepLLMCommandGenerator
, we recommend migrating to the new command generator by inheriting the class from CompactLLMCommandGenerator
.
# Old class definition:
from rasa.dialogue_understanding.generators import SingleStepLLMCommandGenerator
class MyCommandGenerator(SingleStepLLMCommandGenerator):
...
# New class definition:
from rasa.dialogue_understanding.generators import CompactLLMCommandGenerator
class MyCommandGenerator(CompactLLMCommandGenerator):
...
Migration from SingleStepLLMCommandGenerator
s to the CompactLLMCommandGenerator
s with the custom commands
If yo've built a custom command generator that extends SingleStepLLMCommandGenerator
and you've defined new commands or
overridden Rasa's default commands, you should:
- Update the
parse_commands
method to reflect the changes in the command parsing logic. - Update the custom command classes so they are compatible with the latest command interface. For details on updating and implementing custom command classes, please refer to How to customize existing commands section.
In the new implementation, command parsing has been delegated to a dedicated parsing
utility method parse_commands
which can be imported from rasa.dialogue_understanding.generator.command_parser
This method handles the parsing of the predicted LLM output into commands more
effectively and flexibly, especially when using customized or newly introduced command
types.
Here is the new recommended pattern for your command generator's parse_commands
method:
# Import the utility method under a different name to prevent confusion with the
# command generator's `parse_commands`
from rasa.dialogue_understanding.generator.command_parser import (
parse_commands
as parse_commands_using_command_parsers,
)
class CustomCommandGenerator(CompactLLMCommandGenerator):
"""Custom implementation of the LLM command generator."""
...
@classmethod
def parse_commands(
cls, actions: Optional[str], tracker: DialogueStateTracker, flows: FlowsList
) -> List[Command]:
"""Parse the actions returned by the LLM into intents and entities as commands.
Args:
actions: The actions returned by the LLM.
tracker: The tracker containing the current state of the conversation.
flows: The current list of active flows.
Returns:
The parsed commands.
"""
commands = parse_commands_using_command_parsers(
actions,
flows,
# Register any custom command classes you have created here
additional_commands=[CustomCommandClass1, CustomCommandClass2, ...],
# If your custom command classes replaces or extends default commands,
# specify the defaults commands for removal here
default_commands_to_remove=[HumandHandoffCommand, ...]
)
if not commands:
structlogger.warning(
f"{cls.__name__}.parse_commands",
message="No commands were parsed from the LLM actions.",
actions=actions,
)
return commands
Migration of the custom prompt from SingleStepLLMCommandGenerator
s to the CompactLLMCommandGenerator
s
If you've customized the default prompt template previously used with the SingleStepLLMCommandGenerator
and are now migrating to the CompactLLMCommandGenerator
, you must update this
template to use the new prompt commands syntax. This updated command syntax is specifically
optimized for the capabilities of the new CompactLLMCommandGenerator
.
For more details on the new prompt, refer to the documentation here
Update to utter_corrected_previous_input
default utterance
The text of the default utter_corrected_previous_input
utterance has been updated to use a new correction frame
context property context.new_slot_values
instead of context.corrected_slots.values
. The new utterance is:
"Ok, I am updating {{ context.corrected_slots.keys()|join(', ') }} to {{ context.new_slot_values | join(', ') }} respectively."
LLM Judge Config Format Change in E2E Testing
The custom configuration of the LLM Judge used by E2E testing with assertions has been updated to use the llm_judge
key
which follows the same structure as other generative components in Rasa. This can either use model groups
configuration or the individual model configuration option. The llm_judge
key can be used in the conftest.yml
file as shown below:
llm_judge:
llm:
provider: "openai"
model: "gpt-4-0613"
embeddings:
provider: "openai"
model: "text-embedding-ada-002"
action
property in custom slot mapping replaced with run_action_every_turn
With the deprecation of the custom slot mapping in favor of the new controlled mapping
type, the action
property associated with the custom slot mapping has been replaced with the run_action_every_turn
property.
For this reason, if you prefer not to run these custom actions at every turn, it is recommended you remove the action
property from your slot mappings.
Rasa Pro 3.9 to Rasa Pro 3.10
LLM/Embedding Configuration
The LLM and embedding configurations have been updated to use the provider
key instead of the type
key.
These changes apply to all providers, with some examples provided for reference.
Cohere
llm:
provider: "cohere" # instead of "type: cohere"
model: "command-r"
Vertex AI
llm:
provider: "vertex_ai" # instead of "type: vertexai"
model: "gemini-pro"
Hugging Face Hub
llm:
provider: "huggingface" # instead of "type: huggingface_hub"
model: "HuggingFaceH4/zephyr-7b-beta" # instead of "repo_id: HuggingFaceH4/zephyr-7b-beta"
llama.cpp
The support for loading models directly have been removed. You need to deploy the model to a server and use the server URL to load the model.
For instance a llama.cpp server can be run using the following command,
./llama-server -m your_model.gguf --port 8080
.
For more information on llama.cpp server, refer to the llama.cpp documentation The assistant can be configured as:
llm:
provider: "self-hosted" # instead of "type: llamacpp"
api_base: "http://localhost:8000/v1" # instead of "model_path: "/path/to/model.bin""
model: "ggml-org/Meta-Llama-3.1-8B-Instruct-Q4_0-GGUF"
vLLM
The model can be deployed and served through vLLM==0.6.0
.
For instance a vLLM server can be run using the following command,
vllm serve your_model
For more information on vLLM server, refer to the vLLM documentation The assistant can be configured as:
llm:
provider: "self-hosted" # instead of "type: vllm_openai"
api_base: "http://localhost:8000/v1"
model: "NousResearch/Meta-Llama-3-8B-Instruct" # the name of the model you have deployed
CALM exclusively utilizes the chat completions endpoint of the model server, so it's essential that the model's tokenizer includes a chat template. Models lacking a chat template will not be compatible with CALM anymore.
Backward compatibility has been maintained for OpenAI
and Azure
configurations.
For all other providers, ensure the use of the provider
key and review the configuration against the
documentation.
Disabling the cache
For Rasa Pro versions <= 3.9.x
, the correct way to disable the cache was:
llm:
model: ...
cache: false
Rasa Pro 3.10.0
onwards, this has changed since we rely on LiteLLM to manage caching. To avoid errors, change your configuration to -
llm:
model: ...
cache:
no-cache: true
Custom Components using an LLM
As of Rasa Pro 3.10, the backend for sending LLM and Embedding API requests has undergone a significant change. The previous LangChain
version 0.0.329
has been replaced with LiteLLM.
This shift can potentially break custom implementations of components that configure and send API requests to chat completion and embedding endpoints. Specifically, the following components are impacted:
- SingleStepLLMCommandGenerator
- MultiStepLLMCommandGenerator
- ContextualResponseRephraser
- EnterpriseSearchPolicy
- IntentlessPolicy
- FlowRetrieval
- LLMBasedRouter
If your project contains custom components based on any of the affected components listed above, you will need to verify and possibly refactor your code to ensure compatibility with LiteLLM.
Changes to llm_factory
The llm_factory
is used across all components that configure and send API requests to an LLM. Previously, the
llm_factory
relied on LangChain's mapping
to instantiate LangChain clients.
Rasa Pro 3.10 onwards, the llm_factory
returns clients that conform to the new LLMClient
protocol.
This impacts any custom component that was previously relying on LangChain
types.
If you have overridden components, such as a command generator, you will need to update your code to handle the
new return type of LLMClient
. This includes adjusting method calls and ensuring compatibility with the new protocol.
The following method calls will need to be adjusted if you have overridden them:
SingleStepLLMCommandGenerator.invoke_llm
MultiStepLLMCommandGenerator.invoke_llm
ContextualResponseRephraser.rephrase
EnterpriseSearchPolicy.predict_action_probabilities
IntentlessPolicy.generate_answer
LLMBasedRouter.predict_commands
Here's an example of how to update your code:
- Rasa 3.9 - LangChain
- Rasa 3.10 - LiteLLM
from rasa.shared.utils.llm import llm_factory
# get the llm client via factory
llm = llm_factory(config, default_config)
# get the llm response synchronously
sync_completion: str = llm.predict(prompt)
# get the llm response asynchronously
async_completion: str = await llm.apredict(prompt)
from rasa.shared.utils.llm import llm_factory
from rasa.shared.providers.llm.llm_client import LLMClient
from rasa.shared.providers.llm.llm_response import LLMResponse
# get the llm client via factory
llm: LLMClient = llm_factory(config, default_config)
# get the llm response synchronously
sync_response: LLMResponse = llm.completion(prompt) # or llm.completion([prompt_1, prompt_2,..., prompt_n])
sync_completion: str = sync_response.choices[0]
# get the llm response asynchronously
async_response: LLMResponse = await llm.acompletion(prompt) # or llm.acompletion([prompt_1, prompt_2,..., prompt_n])
async_completion: str = async_response.choices[0]
Changes to embedder_factory
The embedder_factory
is used across all components that configure and send API requests to an embedding model.
Previously, the embedder_factory
returned LangChain's embedding clients of Embeddings
type.
Rasa Pro 3.10 onwards, the embedder_factory
returns clients that conform to the new EmbeddingClient
protocol. This change
is part of the move to LiteLLM, and it impacts any custom components that were previously relying on LangChain types.
If you have overridden components that rely on instantiating clients with embedder_factory
you will need to update
your code to handle the new return type of EmbeddingClient
. This includes adjusting method calls and ensuring
compatibility with the new protocol.
The following method calls will need to be adjusted if you have overridden them:
FlowRetrieval.load
FlowRetrieval.populate
EnterpriseSearchPolicy.load
EnterpriseSearchPolicy.train
IntentlessPolicy.load
- Or if you have overridden the
IntentlessPolicy.embedder
attribute.
Here's an example of how to update your code:
- Rasa 3.9 - LangChain
- Rasa 3.10 - LiteLLM
from rasa.shared.utils.llm import embedder_factory
# get the embedding client via factory
embedder = embedder_factory(config, default_config)
# get the embedding response synchronously
vectors: List[List[float]] = embedder.embed_documents([doc_1, doc_2])
# get the embedding response asynchronously
vectors: List[List[float]] = await embedder.aembed_documents([doc_1, doc_2])
from rasa.shared.utils.llm import embedder_factory
from rasa.shared.providers.embedding.embedding_client import EmbeddingClient
from rasa.shared.providers.embedding.embedding_response import EmbeddingResponse
# get the embedding client via factory
embedder: EmbeddingClient = embedder_factory(config, default_config)
# get the embedding response synchronously
sync_response: EmbeddingResponse = embedder.embed([doc_1, doc_2])
vectors: List[List[float]] = sync_response.data
# get the embedding response asynchronously
async_response: EmbeddingResponse = await embedder.aembed([doc_1, doc_2])
vectors: List[List[float]] = async_response.data
Changes to invoke_llm
The previous implementation of invoke_llm
method in SingleStepLLMCommandGenerator
, MultiStepLLMCommandGenerator
,
and the deprecated LLMCommandGenerator
used llm_factory
to instantiate LangChain clients. Since the factory now
returns clients that conform to the new LLMClient
protocol, any custom overrides of the invoke_llm
method will need
to be updated to accommodate the new return type.
Below you can find the invoke_llm
method from Rasa Pro 3.9 and its updated version in Rasa Pro 3.10:
- Rasa 3.9
- Rasa 3.10
async def invoke_llm(self, prompt: Text) -> Optional[Text]:
"""Use LLM to generate a response.
Args:
prompt: The prompt to send to the LLM.
Returns:
The generated text.
Raises:
ProviderClientAPIException if an error during API call.
"""
llm = llm_factory(self.config.get(LLM_CONFIG_KEY), DEFAULT_LLM_CONFIG)
try:
return await llm.apredict(prompt)
except Exception as e:
structlogger.error("llm_based_command_generator.llm.error", error=e)
raise ProviderClientAPIException(
message="LLM call exception", original_exception=e
)
async def invoke_llm(self, prompt: Text) -> Optional[Text]:
"""Use LLM to generate a response.
Args:
prompt: The prompt to send to the LLM.
Returns:
The generated text.
Raises:
ProviderClientAPIException if an error during API call.
"""
llm = llm_factory(self.config.get(LLM_CONFIG_KEY), DEFAULT_LLM_CONFIG)
try:
llm_response = await llm.acompletion(prompt)
return llm_response.choices[0]
except Exception as e:
structlogger.error("llm_based_command_generator.llm.error", error=e)
raise ProviderClientAPIException(
message="LLM call exception", original_exception=e
)
Changes to SingleStepLLMCommandGenerator.predict_commands
For SingleStepLLMCommandGenerator
, the predict_commands
method now includes a call to
self._update_message_parse_data_for_fine_tuning(message, commands, flow_prompt)
. This function is essential for
enabling the fine-tuning recipe.
If you have overridden the predict_commands
method, you need to manually add this call to ensure proper functionality:
async def predict_commands(
self,
message: Message,
flows: FlowsList,
tracker: Optional[DialogueStateTracker] = None,
**kwargs: Any,
) -> List[Command]:
...
action_list = await self.invoke_llm(flow_prompt)
commands = self.parse_commands(action_list, tracker, flows)
self._update_message_parse_data_for_fine_tuning(message, commands, flow_prompt)
return commands
Changes to the default configuration dictionary
The default configurations for the following components have been updated:
- SingleStepLLMCommandGenerator
- MultiStepLLMCommandGenerator
- ContextualResponseRephraser
- EnterpriseSearchPolicy
- IntentlessPolicy
- FlowRetrieval
- LLMBasedRouter
If you have custom implementations based on the default configurations for any of these components, ensure that your configuration dictionary aligns with the updates shown in the tables below, as the defaults have changed.
Default LLM configuration keys have been updated from:
DEFAULT_LLM_CONFIG = {
"_type": "openai",
"model_name": ...,
"request_timeout": ...,
"temperature": ...,
"max_tokens": ...,
}
to:
DEFAULT_LLM_CONFIG = {
"provider": "openai",
"model": ...,
"temperature": ...,
"max_tokens": ...,
"timeout": ...,
}
Similarly, default embedding configuration keys have been updated from:
DEFAULT_EMBEDDINGS_CONFIG = {
"_type": "openai",
"model": ...,
}
to:
DEFAULT_EMBEDDINGS_CONFIG = {
"provider": "openai",
"model": ...,
}
Be sure to update your custom configurations to reflect these changes in order to ensure continued functionality.
Dropped support for Python 3.8
Dropped support for Python 3.8 ahead of Python 3.8 End of Life in October 2024.
In Rasa Pro versions 3.10.0
, 3.9.11
and 3.8.13
, we needed to pin the TensorFlow library version to 2.13.0rc1 in order to remove critical vulnerabilities;
this resulted in poor user experience when installing these versions of Rasa Pro with uv pip
.
Removing support for Python 3.8 will make it possible to upgrade to a stabler version of TensorFlow.
Rasa Pro 3.8 to Rasa Pro 3.9
LLMCommandGenerator
Starting from Rasa Pro 3.9 the former LLMCommandGenerator
is replaced by SingleStepLLMCommandGenerator
.
The LLMCommandGenerator
is now deprecated and will be removed in version 4.0.0
.
The SingleStepLLMCommandGenerator
differs from the LLMCommandGenerator
in how it handles failures
of the invoke_llm
method. Specifically, if the invoke_llm
method call fails in SingleStepLLMCommandGenerator
,
it raises a ProviderClientAPIException
. In contrast, the LLMCommandGenerator
simply returns None
when the method call fails.