Version Migration Guide
This page contains information about changes between major versions and how you can migrate from one version to another.
Rasa Pro 3.9 to Rasa Pro 3.10
LLM/Embedding Configuration
The LLM and embedding configurations have been updated to use the provider
key instead of the type
key.
These changes apply to all providers, with some examples provided for reference.
Cohere
llm:
provider: "cohere" # instead of "type: cohere"
model: "command-r"
Vertex AI
llm:
provider: "vertex_ai" # instead of "type: vertexai"
model: "gemini-pro"
Hugging Face Hub
llm:
provider: "huggingface" # instead of "type: huggingface_hub"
model: "HuggingFaceH4/zephyr-7b-beta" # instead of "repo_id: HuggingFaceH4/zephyr-7b-beta"
llama.cpp
The support for loading models directly have been removed. You need to deploy the model to a server and use the server URL to load the model.
For instance a llama.cpp server can be run using the following command,
./llama-server -m your_model.gguf --port 8080
.
For more information on llama.cpp server, refer to the llama.cpp documentation The assistant can be configured as:
llm:
provider: "self-hosted" # instead of "type: llamacpp"
api_base: "http://localhost:8000/v1" # instead of "model_path: "/path/to/model.bin""
model: "ggml-org/Meta-Llama-3.1-8B-Instruct-Q4_0-GGUF"
vLLM
The model can be deployed and served through vLLM==0.6.0
.
For instance a vLLM server can be run using the following command,
vllm serve your_model
For more information on vLLM server, refer to the vLLM documentation The assistant can be configured as:
llm:
provider: "self-hosted" # instead of "type: vllm_openai"
api_base: "http://localhost:8000/v1"
model: "NousResearch/Meta-Llama-3-8B-Instruct" # the name of the model you have deployed
CALM exclusively utilizes the chat completions endpoint of the model server, so it's essential that the model's tokenizer includes a chat template. Models lacking a chat template will not be compatible with CALM anymore.
Backward compatibility has been maintained for OpenAI
and Azure
configurations.
For all other providers, ensure the use of the provider
key and review the configuration against the
documentation.
Disabling the cache
For Rasa Pro versions <= 3.9.x
, the correct way to disable the cache was:
llm:
model: ...
cache: false
Rasa Pro 3.10.0
onwards, this has changed since we rely on LiteLLM to manage caching. To avoid errors, change your configuration to -
llm:
model: ...
cache:
no-cache: true
Custom Components using an LLM
As of Rasa Pro 3.10, the backend for sending LLM and Embedding API requests has undergone a significant change. The previous LangChain
version 0.0.329
has been replaced with LiteLLM.
This shift can potentially break custom implementations of components that configure and send API requests to chat completion and embedding endpoints. Specifically, the following components are impacted:
- SingleStepLLMCommandGenerator
- MultiStepLLMCommandGenerator
- ContextualResponseRephraser
- EnterpriseSearchPolicy
- IntentlessPolicy
- FlowRetrieval
- LLMBasedRouter
If your project contains custom components based on any of the affected components listed above, you will need to verify and possibly refactor your code to ensure compatibility with LiteLLM.
Changes to llm_factory
The llm_factory
is used across all components that configure and send API requests to an LLM. Previously, the
llm_factory
relied on LangChain's mapping
to instantiate LangChain clients.
Rasa Pro 3.10 onwards, the llm_factory
returns clients that conform to the new LLMClient
protocol.
This impacts any custom component that was previously relying on LangChain
types.
If you have overridden components, such as a command generator, you will need to update your code to handle the
new return type of LLMClient
. This includes adjusting method calls and ensuring compatibility with the new protocol.
The following method calls will need to be adjusted if you have overridden them:
SingleStepLLMCommandGenerator.invoke_llm
MultiStepLLMCommandGenerator.invoke_llm
ContextualResponseRephraser.rephrase
EnterpriseSearchPolicy.predict_action_probabilities
IntentlessPolicy.generate_answer
LLMBasedRouter.predict_commands
Here’s an example of how to update your code:
- Rasa 3.9 - LangChain
- Rasa 3.10 - LiteLLM
from rasa.shared.utils.llm import llm_factory
# get the llm client via factory
llm = llm_factory(config, default_config)
# get the llm response synchronously
sync_completion: str = llm.predict(prompt)
# get the llm response asynchronously
async_completion: str = await llm.apredict(prompt)
from rasa.shared.utils.llm import llm_factory
from rasa.shared.providers.llm.llm_client import LLMClient
from rasa.shared.providers.llm.llm_response import LLMResponse
# get the llm client via factory
llm: LLMClient = llm_factory(config, default_config)
# get the llm response synchronously
sync_response: LLMResponse = llm.completion(prompt) # or llm.completion([prompt_1, prompt_2,..., prompt_n])
sync_completion: str = sync_response.choices[0]
# get the llm response asynchronously
async_response: LLMResponse = await llm.acompletion(prompt) # or llm.acompletion([prompt_1, prompt_2,..., prompt_n])
async_completion: str = async_response.choices[0]
Changes to embedder_factory
The embedder_factory
is used across all components that configure and send API requests to an embedding model.
Previously, the embedder_factory
returned LangChain's embedding clients of Embeddings
type.
Rasa Pro 3.10 onwards, the embedder_factory
returns clients that conform to the new EmbeddingClient
protocol. This change
is part of the move to LiteLLM, and it impacts any custom components that were previously relying on LangChain types.
If you have overridden components that rely on instantiating clients with embedder_factory
you will need to update
your code to handle the new return type of EmbeddingClient
. This includes adjusting method calls and ensuring
compatibility with the new protocol.
The following method calls will need to be adjusted if you have overridden them:
FlowRetrieval.load
FlowRetrieval.populate
EnterpriseSearchPolicy.load
EnterpriseSearchPolicy.train
IntentlessPolicy.load
- Or if you have overridden the
IntentlessPolicy.embedder
attribute.
Here’s an example of how to update your code:
- Rasa 3.9 - LangChain
- Rasa 3.10 - LiteLLM
from rasa.shared.utils.llm import embedder_factory
# get the embedding client via factory
embedder = embedder_factory(config, default_config)
# get the embedding response synchronously
vectors: List[List[float]] = embedder.embed_documents([doc_1, doc_2])
# get the embedding response asynchronously
vectors: List[List[float]] = await embedder.aembed_documents([doc_1, doc_2])
from rasa.shared.utils.llm import embedder_factory
from rasa.shared.providers.embedding.embedding_client import EmbeddingClient
from rasa.shared.providers.embedding.embedding_response import EmbeddingResponse
# get the embedding client via factory
embedder: EmbeddingClient = embedder_factory(config, default_config)
# get the embedding response synchronously
sync_response: EmbeddingResponse = embedder.embed([doc_1, doc_2])
vectors: List[List[float]] = sync_response.data
# get the embedding response asynchronously
async_response: EmbeddingResponse = await embedder.aembed([doc_1, doc_2])
vectors: List[List[float]] = async_response.data
Changes to invoke_llm
The previous implementation of invoke_llm
method in SingleStepLLMCommandGenerator
, MultiStepLLMCommandGenerator
,
and the deprecated LLMCommandGenerator
used llm_factory
to instantiate LangChain clients. Since the factory now
returns clients that conform to the new LLMClient
protocol, any custom overrides of the invoke_llm
method will need
to be updated to accommodate the new return type.
Below you can find the invoke_llm
method from Rasa Pro 3.9 and its updated version in Rasa Pro 3.10:
- Rasa 3.9
- Rasa 3.10
async def invoke_llm(self, prompt: Text) -> Optional[Text]:
"""Use LLM to generate a response.
Args:
prompt: The prompt to send to the LLM.
Returns:
The generated text.
Raises:
ProviderClientAPIException if an error during API call.
"""
llm = llm_factory(self.config.get(LLM_CONFIG_KEY), DEFAULT_LLM_CONFIG)
try:
return await llm.apredict(prompt)
except Exception as e:
structlogger.error("llm_based_command_generator.llm.error", error=e)
raise ProviderClientAPIException(
message="LLM call exception", original_exception=e
)
async def invoke_llm(self, prompt: Text) -> Optional[Text]:
"""Use LLM to generate a response.
Args:
prompt: The prompt to send to the LLM.
Returns:
The generated text.
Raises:
ProviderClientAPIException if an error during API call.
"""
llm = llm_factory(self.config.get(LLM_CONFIG_KEY), DEFAULT_LLM_CONFIG)
try:
llm_response = await llm.acompletion(prompt)
return llm_response.choices[0]
except Exception as e:
structlogger.error("llm_based_command_generator.llm.error", error=e)
raise ProviderClientAPIException(
message="LLM call exception", original_exception=e
)
Changes to SingleStepLLMCommandGenerator.predict_commands
For SingleStepLLMCommandGenerator
, the predict_commands
method now includes a call to
self._update_message_parse_data_for_fine_tuning(message, commands, flow_prompt)
. This function is essential for
enabling the fine-tuning recipe.
If you have overridden the predict_commands
method, you need to manually add this call to ensure proper functionality:
async def predict_commands(
self,
message: Message,
flows: FlowsList,
tracker: Optional[DialogueStateTracker] = None,
**kwargs: Any,
) -> List[Command]:
...
action_list = await self.invoke_llm(flow_prompt)
commands = self.parse_commands(action_list, tracker, flows)
self._update_message_parse_data_for_fine_tuning(message, commands, flow_prompt)
return commands
Changes to the default configuration dictionary
The default configurations for the following components have been updated:
- SingleStepLLMCommandGenerator
- MultiStepLLMCommandGenerator
- ContextualResponseRephraser
- EnterpriseSearchPolicy
- IntentlessPolicy
- FlowRetrieval
- LLMBasedRouter
If you have custom implementations based on the default configurations for any of these components, ensure that your configuration dictionary aligns with the updates shown in the tables below, as the defaults have changed.
Default LLM configuration keys have been updated from:
DEFAULT_LLM_CONFIG = {
"_type": "openai",
"model_name": ...,
"request_timeout": ...,
"temperature": ...,
"max_tokens": ...,
}
to:
DEFAULT_LLM_CONFIG = {
"provider": "openai",
"model": ...,
"temperature": ...,
"max_tokens": ...,
"timeout": ...,
}
Similarly, default embedding configuration keys have been updated from:
DEFAULT_EMBEDDINGS_CONFIG = {
"_type": "openai",
"model": ...,
}
to:
DEFAULT_EMBEDDINGS_CONFIG = {
"provider": "openai",
"model": ...,
}
Be sure to update your custom configurations to reflect these changes in order to ensure continued functionality.
Dropped support for Python 3.8
Dropped support for Python 3.8 ahead of Python 3.8 End of Life in October 2024.
In Rasa Pro versions 3.10.0
, 3.9.11
and 3.8.13
, we needed to pin the TensorFlow library version to 2.13.0rc1 in order to remove critical vulnerabilities;
this resulted in poor user experience when installing these versions of Rasa Pro with uv pip
.
Removing support for Python 3.8 will make it possible to upgrade to a stabler version of TensorFlow.
Rasa Pro 3.8 to Rasa Pro 3.9
LLMCommandGenerator
Starting from Rasa Pro 3.9 the former LLMCommandGenerator
is replaced by SingleStepLLMCommandGenerator
.
The LLMCommandGenerator
is now deprecated and will be removed in version 4.0.0
.
The SingleStepLLMCommandGenerator
differs from the LLMCommandGenerator
in how it handles failures
of the invoke_llm
method. Specifically, if the invoke_llm
method call fails in SingleStepLLMCommandGenerator
,
it raises a ProviderClientAPIException
. In contrast, the LLMCommandGenerator
simply returns None
when the method call fails.
Slot Mappings
In case you had been using custom
slot mapping type for slots set with the prediction of the LLM-based command generator,
you need to update your assistant's slot configuration to use the new from_llm
slot mapping type.
Note that even if you have written custom slot validation actions (following the validate_<slot_name>
convention)
for slots set by the LLM-based command generator, you need to update your assistant's slot configuration to use the new from_llm
slot mapping type.
For slots that are set only via a custom action e.g. slots set by external sources only, you must add the action name to the slot mapping:
slots:
slot_name:
type: text
mappings:
- type: custom
action: custom_action_name
Rasa Pro 3.8.0 to Rasa Pro 3.8.1
Poetry Installation
Starting from Rasa Pro 3.8.1 in the 3.8.x
minor series, we have upgraded the version Poetry for
managing dependencies in the Rasa Pro Python package to 1.8.2
.
To install the latest micro versions of Rasa Pro in your project, you must first upgrade Poetry to version 1.8.2
:
poetry self update 1.8.2
Rasa Pro 3.7 to 3.8
Starting from 3.8.0
, Rasa and Rasa Plus have been merged into a single artifact, named Rasa Pro.
Installation
Following the merge we renamed the resulting python package and Docker image to rasa-pro
.
Python package
Rasa Pro python package, for 3.8.0
and onward, is located at:
https://europe-west3-python.pkg.dev/rasa-releases/rasa-pro-python
Name of the package is rasa-pro
.
Example of how to install the package:
pip install --extra-index-url=https://europe-west3-python.pkg.dev/rasa-releases/rasa-pro-python/simple rasa-pro==3.8.0
While python package name was changed, the import process remains the same:
import rasa.core
from rasa import train
For more information on how to install Rasa Pro, please refer to the Python installation guide.
Helm Chart / Docker Image
Rasa Pro docker image, for 3.8.0
and onward, is located at:
europe-west3-docker.pkg.dev/rasa-releases/rasa-pro/rasa-pro
Example how to pull the image:
docker pull europe-west3-docker.pkg.dev/rasa-releases/rasa-pro/rasa-pro:3.8.0
For more information on how to install Rasa Pro Docker image, please refer to the Docker installation guide.
Component Yaml Configuration Changes
Follow the below instructions to update the configuration of Rasa Pro components in the 3.8 version:
ConcurrentRedisLockStore
- updateendpoints.yml
totype: concurrent_redis
:
lock_store:
type: concurrent_redis
ContextualResponseRephraser
- updateendpoints.yml
to eithertype: rephrase
ortype: rasa.core.ContextualResponseRephraser
:
nlg:
type: rephrase
- Audiocodes and Vier CVG channels can be specified in
credentials.yml
using directly their channel name:
audiocodes:
token: "sample_token"
vier_cvg:
...
EnterpriseSearchPolicy
andIntentlessPolicy
- updateconfig.yml
to only use the policy class name:
policies:
- name: EnterpriseSearchPolicy
- name: IntentlessPolicy
Changes to default behaviour
With Rasa Pro 3.8, we introduced a couple of changes that rectifies the default behaviour of certain components. We believe these changes align better with the principles of CALM. If you are migrating an assistant built with Rasa Pro 3.7, please ensure you have checked if these changes affect your assistant.
Prompt Rendering
Rasa Pro 3.8 introduces a new feature flow-retrieval
which ensures that only the flows that are relevant to the conversation context are
included in the prompt sent to the LLM in the LLMCommandGenerator
. This helps the assistant
scale to a higher number of flows and also reduces the LLM costs.
This feature is enabled by default and we recommend to use it if the assistant has more than 40 flows. By default, the feature uses embedding models from OpenAI, but if you are using a different provider (for e.g. Azure), please ensure -
- An embedding model is configured with the provider.
LLMCommandGenerator
has been configured correctly to connect to the embedding provider. For example, see the section on configuration required to connect to Azure OpenAI service
If you wish to disable the feature you can configure the LLMCommandGenerator
as:
pipeline:
- name: SingleStepLLMCommandGenerator
...
flow_retrieval:
active: false
...
Processing Chitchat
The default behaviour in Rasa Pro 3.7 to handle chitchat utterances was to rely on free form generative responses.
This can lead to the assistant sending unwanted responses or responding to
out of scope user utterances. The new default behaviour in Rasa Pro 3.8 is to rely on
IntentlessPolicy
to respond to chitchat utterances using pre-defined
responses only.
If you were relying on free form generative responses to handle chitchat in Rasa Pro 3.7,
you will now see a warning message when you train the same assistant with Rasa Pro 3.8 -
" pattern_chitchat
has an action step with action_trigger_chitchat
, but IntentlessPolicy
is
not configured". This appears because the default definition of pattern_chitchat
has been modified
in Rasa Pro 3.8 to:
pattern_chitchat:
description: handle interactions with the user that are not task-oriented
name: pattern chitchat
steps:
- action: action_trigger_chitchat
For the assistant to be able to handle chitchat utterances, you have two options:
-
If you are happy with free-form generative responses for such user utterances, then you can override
pattern_chitchat
to:pattern_chitchat:
description: handle interactions with the user that are not task-oriented
name: pattern chitchat
steps:
- action: utter_free_chitchat_response -
If you want to switch to using pre-defined responses, you should first add
IntentlessPolicy
to thepolicies
section of the config -policies:
- name: IntentlessPolicy
Next, you should add response templates for the pre-defined responses you want the assistant to consider when responding to a chitchat user utterance.
Handling of categorical slots
Rasa Pro versions <= 3.7.8
used to store the value of a categorical slot in the same casing as it was
either specified in the user message or predicted by the LLM in a SetSlot
command. This wasn't necessarily
same as the casing used in the corresponding possible value defined for that slot in the domain. For e.g,
if the categorical slot was defined to have [A, B, C]
as the possible values and the prediction was
to set it to a
then the slot would be set to a
. This lead to problems downstream when that slot
had to be used in other primitives i.e. flows or custom action.
Rasa Pro 3.7.9
fixes this by always storing the slot value in the same casing as defined in the domain. So,
in the above example, the slot would now be stored as A
instead of a
. This ensures that the user is
writing business logic for slot comparisons, for e.g. if
conditions in flows, using the same casing as defined
by them in the domain.
If you are migrating from Rasa pro versions <= 3.7.8
, please double check your flows and custom actions
to make sure none of them break because of this change.
Update default signature of LLM calls
In Rasa Pro >= 3.8
we switched from doing synchronous LLM calls to asynchronous calls.
We updated all components that use an LLM, e.g.
LLMCommandGenerator
ContextualResponseRephraser
EnterpriseSearchPolicy
IntentlessPolicy
This can potentially break assistants migrating to 3.8 that have sub-classed one of these components in their own custom components.
For example, the method predict_commands
in the LLMCommandGenerator
is now async
and needs to await
the methods
_generate_action_list_using_llm
and flow_retrieval.filter_flows
as these methods are also async.
For more information on asyncio please check their documentation.
Dependency Upgrades
We've updated our core dependencies to enhance functionality and performance across our platform.
Spacy 3.7.x
Upgraded from >=3.6
to >=3.7
.
We have transitioned to using Spacy version 3.7.x to benefit from the latest enhancements in natural language processing. If you're using any spacy models with your assistant, please update them to Spacy 3.7.x compatible models.
Pydantic 2.x
Upgraded from >=1.10.9,<1.10.10
to ^2.0
.
Along with the Spacy upgrade, we have moved to Pydantic version 2.x, which necessitates updates to Pydantic models. For assistance with updating your models, please refer to the Pydantic Migration Guide. This ensures compatibility with the latest improvements in data validation and settings management.
Rasa Pro 3.7.9 to Rasa Pro 3.7.10
Poetry Installation
Starting from Rasa Pro 3.7.10 in the 3.7.x
minor series, we have upgraded the version Poetry for
managing dependencies in the Rasa Pro Python package to 1.8.2
.
To install Rasa Pro in your project, you must first upgrade Poetry to version 1.8.2
:
poetry self update 1.8.2
Rasa Pro 3.7.8 to Rasa Pro 3.7.9
Changes to default behaviour
Handling of categorical slots
Rasa Pro versions <= 3.7.8
used to store the value of a categorical slot in the same casing as it was
either specified in the user message or predicted by the LLM in a SetSlot
command. This wasn't necessarily
same as the casing used in the corresponding possible value defined for that slot in the domain. For e.g,
if the categorical slot was defined to have [A, B, C]
as the possible values and the prediction was
to set it to a
then the slot would be set to a
. This lead to problems downstream when that slot
had to be used in other primitives i.e. flows or custom action.
Rasa Pro 3.7.9
fixes this by always storing the slot value in the same casing as defined in the domain. So,
in the above example, the slot would now be stored as A
instead of a
. This ensures that the user is
writing business logic for slot comparisons, for e.g. if
conditions in flows, using the same casing as defined
by them in the domain.
If you are migrating from Rasa pro versions <= 3.7.8
, please double check your flows and custom actions
to make sure none of them break because of this change.
Rasa 3.6 to Rasa Pro 3.7
Installation
Starting from Rasa 3.7.0, Rasa has moved to a new package registry and Docker registry. You will need to update your package registry to install Rasa 3.7.0 and later versions. If you are a Rasa customer, please reach out to your Rasa account manager or support obtain a license.
Python package
Rasa python package for 3.7.0
has been moved to python package registry.
https://europe-west3-python.pkg.dev/rasa-releases/rasa-plus-py
Name of the package is rasa
.
Example of how to install the package:
pip install --extra-index-url=https://europe-west3-python.pkg.dev/rasa-releases/rasa-plus-py/simple rasa==3.7.0
For more information on how to install Rasa Pro, please refer to the Python installation guide.
Helm Chart / Docker Image
Rasa docker image for 3.7.0
is located at:
europe-west3-docker.pkg.dev/rasa-releases/rasa-docker/rasa
Example how to pull the image:
docker pull europe-west3-docker.pkg.dev/rasa-releases/rasa-docker/rasa:3.7.0
For more information on how to install Rasa Pro Docker image, please refer to the Docker installation guide.
Migrating from older versions
For migrating from Rasa Open Source versions, please refer to the migration guide.