Skip to main content

Version Migration Guide

This page contains information about changes between major versions and how you can migrate from one version to another.

Rasa Pro 3.9 to Rasa Pro 3.10

LLM/Embedding Configuration

The LLM and embedding configurations have been updated to use the provider key instead of the type key. These changes apply to all providers, with some examples provided for reference.

Cohere

llm:
provider: "cohere" # instead of "type: cohere"
model: "command-r"

Vertex AI

llm:
provider: "vertex_ai" # instead of "type: vertexai"
model: "gemini-pro"

Hugging Face Hub

llm:
provider: "huggingface" # instead of "type: huggingface_hub"
model: "HuggingFaceH4/zephyr-7b-beta" # instead of "repo_id: HuggingFaceH4/zephyr-7b-beta"

llama.cpp

The support for loading models directly have been removed. You need to deploy the model to a server and use the server URL to load the model. For instance a llama.cpp server can be run using the following command, ./llama-server -m your_model.gguf --port 8080.

For more information on llama.cpp server, refer to the llama.cpp documentation The assistant can be configured as:

llm:
provider: "self-hosted" # instead of "type: llamacpp"
api_base: "http://localhost:8000/v1" # instead of "model_path: "/path/to/model.bin""
model: "ggml-org/Meta-Llama-3.1-8B-Instruct-Q4_0-GGUF"

vLLM

The model can be deployed and served through vLLM==0.6.0. For instance a vLLM server can be run using the following command, vllm serve your_model

For more information on vLLM server, refer to the vLLM documentation The assistant can be configured as:

llm:
provider: "self-hosted" # instead of "type: vllm_openai"
api_base: "http://localhost:8000/v1"
model: "NousResearch/Meta-Llama-3-8B-Instruct" # the name of the model you have deployed
note

CALM exclusively utilizes the chat completions endpoint of the model server, so it's essential that the model's tokenizer includes a chat template. Models lacking a chat template will not be compatible with CALM anymore.

Backward compatibility has been maintained for OpenAI and Azure configurations. For all other providers, ensure the use of the provider key and review the configuration against the documentation.

Disabling the cache

For Rasa Pro versions <= 3.9.x, the correct way to disable the cache was:

llm:
model: ...
cache: false

Rasa Pro 3.10.0 onwards, this has changed since we rely on LiteLLM to manage caching. To avoid errors, change your configuration to -

llm:
model: ...
cache:
no-cache: true

Custom Components using an LLM

As of Rasa Pro 3.10, the backend for sending LLM and Embedding API requests has undergone a significant change. The previous LangChain version 0.0.329 has been replaced with LiteLLM.

This shift can potentially break custom implementations of components that configure and send API requests to chat completion and embedding endpoints. Specifically, the following components are impacted:

If your project contains custom components based on any of the affected components listed above, you will need to verify and possibly refactor your code to ensure compatibility with LiteLLM.

Changes to llm_factory

The llm_factory is used across all components that configure and send API requests to an LLM. Previously, the llm_factory relied on LangChain's mapping to instantiate LangChain clients.

Rasa Pro 3.10 onwards, the llm_factory returns clients that conform to the new LLMClient protocol. This impacts any custom component that was previously relying on LangChain types.

If you have overridden components, such as a command generator, you will need to update your code to handle the new return type of LLMClient. This includes adjusting method calls and ensuring compatibility with the new protocol.

The following method calls will need to be adjusted if you have overridden them:

  • SingleStepLLMCommandGenerator.invoke_llm
  • MultiStepLLMCommandGenerator.invoke_llm
  • ContextualResponseRephraser.rephrase
  • EnterpriseSearchPolicy.predict_action_probabilities
  • IntentlessPolicy.generate_answer
  • LLMBasedRouter.predict_commands

Here’s an example of how to update your code:


from rasa.shared.utils.llm import llm_factory
from rasa.shared.providers.llm.llm_client import LLMClient
from rasa.shared.providers.llm.llm_response import LLMResponse

# get the llm client via factory
llm: LLMClient = llm_factory(config, default_config)

# get the llm response synchronously
sync_response: LLMResponse = llm.completion(prompt) # or llm.completion([prompt_1, prompt_2,..., prompt_n])
sync_completion: str = sync_response.choices[0]

# get the llm response asynchronously
async_response: LLMResponse = await llm.acompletion(prompt) # or llm.acompletion([prompt_1, prompt_2,..., prompt_n])
async_completion: str = async_response.choices[0]

Changes to embedder_factory

The embedder_factory is used across all components that configure and send API requests to an embedding model. Previously, the embedder_factory returned LangChain's embedding clients of Embeddings type.

Rasa Pro 3.10 onwards, the embedder_factory returns clients that conform to the new EmbeddingClient protocol. This change is part of the move to LiteLLM, and it impacts any custom components that were previously relying on LangChain types.

If you have overridden components that rely on instantiating clients with embedder_factory you will need to update your code to handle the new return type of EmbeddingClient. This includes adjusting method calls and ensuring compatibility with the new protocol.

The following method calls will need to be adjusted if you have overridden them:

  • FlowRetrieval.load
  • FlowRetrieval.populate
  • EnterpriseSearchPolicy.load
  • EnterpriseSearchPolicy.train
  • IntentlessPolicy.load
  • Or if you have overridden the IntentlessPolicy.embedder attribute.

Here’s an example of how to update your code:


from rasa.shared.utils.llm import embedder_factory
from rasa.shared.providers.embedding.embedding_client import EmbeddingClient
from rasa.shared.providers.embedding.embedding_response import EmbeddingResponse

# get the embedding client via factory
embedder: EmbeddingClient = embedder_factory(config, default_config)

# get the embedding response synchronously
sync_response: EmbeddingResponse = embedder.embed([doc_1, doc_2])
vectors: List[List[float]] = sync_response.data

# get the embedding response asynchronously
async_response: EmbeddingResponse = await embedder.aembed([doc_1, doc_2])
vectors: List[List[float]] = async_response.data

Changes to invoke_llm

The previous implementation of invoke_llm method in SingleStepLLMCommandGenerator, MultiStepLLMCommandGenerator, and the deprecated LLMCommandGenerator used llm_factory to instantiate LangChain clients. Since the factory now returns clients that conform to the new LLMClient protocol, any custom overrides of the invoke_llm method will need to be updated to accommodate the new return type.

Below you can find the invoke_llm method from Rasa Pro 3.9 and its updated version in Rasa Pro 3.10:


async def invoke_llm(self, prompt: Text) -> Optional[Text]:
"""Use LLM to generate a response.

Args:
prompt: The prompt to send to the LLM.

Returns:
The generated text.

Raises:
ProviderClientAPIException if an error during API call.
"""
llm = llm_factory(self.config.get(LLM_CONFIG_KEY), DEFAULT_LLM_CONFIG)
try:
llm_response = await llm.acompletion(prompt)
return llm_response.choices[0]
except Exception as e:
structlogger.error("llm_based_command_generator.llm.error", error=e)
raise ProviderClientAPIException(
message="LLM call exception", original_exception=e
)

Changes to SingleStepLLMCommandGenerator.predict_commands

For SingleStepLLMCommandGenerator, the predict_commands method now includes a call to self._update_message_parse_data_for_fine_tuning(message, commands, flow_prompt). This function is essential for enabling the fine-tuning recipe.

If you have overridden the predict_commands method, you need to manually add this call to ensure proper functionality:


async def predict_commands(
self,
message: Message,
flows: FlowsList,
tracker: Optional[DialogueStateTracker] = None,
**kwargs: Any,
) -> List[Command]:

...
action_list = await self.invoke_llm(flow_prompt)
commands = self.parse_commands(action_list, tracker, flows)

self._update_message_parse_data_for_fine_tuning(message, commands, flow_prompt)

return commands

Changes to the default configuration dictionary

The default configurations for the following components have been updated:

If you have custom implementations based on the default configurations for any of these components, ensure that your configuration dictionary aligns with the updates shown in the tables below, as the defaults have changed.

Default LLM configuration keys have been updated from:

DEFAULT_LLM_CONFIG = {
"_type": "openai",
"model_name": ...,
"request_timeout": ...,
"temperature": ...,
"max_tokens": ...,
}

to:

DEFAULT_LLM_CONFIG = {
"provider": "openai",
"model": ...,
"temperature": ...,
"max_tokens": ...,
"timeout": ...,
}

Similarly, default embedding configuration keys have been updated from:

DEFAULT_EMBEDDINGS_CONFIG = {
"_type": "openai",
"model": ...,
}

to:

DEFAULT_EMBEDDINGS_CONFIG = {
"provider": "openai",
"model": ...,
}

Be sure to update your custom configurations to reflect these changes in order to ensure continued functionality.

Dropped support for Python 3.8

Dropped support for Python 3.8 ahead of Python 3.8 End of Life in October 2024.

In Rasa Pro versions 3.10.0, 3.9.11 and 3.8.13, we needed to pin the TensorFlow library version to 2.13.0rc1 in order to remove critical vulnerabilities; this resulted in poor user experience when installing these versions of Rasa Pro with uv pip. Removing support for Python 3.8 will make it possible to upgrade to a stabler version of TensorFlow.

Rasa Pro 3.8 to Rasa Pro 3.9

LLMCommandGenerator

Starting from Rasa Pro 3.9 the former LLMCommandGenerator is replaced by SingleStepLLMCommandGenerator. The LLMCommandGenerator is now deprecated and will be removed in version 4.0.0.

The SingleStepLLMCommandGenerator differs from the LLMCommandGenerator in how it handles failures of the invoke_llm method. Specifically, if the invoke_llm method call fails in SingleStepLLMCommandGenerator, it raises a ProviderClientAPIException. In contrast, the LLMCommandGenerator simply returns None when the method call fails.

Slot Mappings

In case you had been using custom slot mapping type for slots set with the prediction of the LLM-based command generator, you need to update your assistant's slot configuration to use the new from_llm slot mapping type. Note that even if you have written custom slot validation actions (following the validate_<slot_name> convention) for slots set by the LLM-based command generator, you need to update your assistant's slot configuration to use the new from_llm slot mapping type.

For slots that are set only via a custom action e.g. slots set by external sources only, you must add the action name to the slot mapping:

slots:
slot_name:
type: text
mappings:
- type: custom
action: custom_action_name

Rasa Pro 3.8.0 to Rasa Pro 3.8.1

Poetry Installation

Starting from Rasa Pro 3.8.1 in the 3.8.x minor series, we have upgraded the version Poetry for managing dependencies in the Rasa Pro Python package to 1.8.2. To install the latest micro versions of Rasa Pro in your project, you must first upgrade Poetry to version 1.8.2:

poetry self update 1.8.2

Rasa Pro 3.7 to 3.8

info

Starting from 3.8.0, Rasa and Rasa Plus have been merged into a single artifact, named Rasa Pro.

Installation

Following the merge we renamed the resulting python package and Docker image to rasa-pro.

Python package

Rasa Pro python package, for 3.8.0 and onward, is located at:

https://europe-west3-python.pkg.dev/rasa-releases/rasa-pro-python

Name of the package is rasa-pro.

Example of how to install the package:

pip install  --extra-index-url=https://europe-west3-python.pkg.dev/rasa-releases/rasa-pro-python/simple rasa-pro==3.8.0

While python package name was changed, the import process remains the same:

import rasa.core

from rasa import train

For more information on how to install Rasa Pro, please refer to the Python installation guide.

Helm Chart / Docker Image

Rasa Pro docker image, for 3.8.0 and onward, is located at:

europe-west3-docker.pkg.dev/rasa-releases/rasa-pro/rasa-pro

Example how to pull the image:

docker pull europe-west3-docker.pkg.dev/rasa-releases/rasa-pro/rasa-pro:3.8.0

For more information on how to install Rasa Pro Docker image, please refer to the Docker installation guide.

Component Yaml Configuration Changes

Follow the below instructions to update the configuration of Rasa Pro components in the 3.8 version:

lock_store:
type: concurrent_redis
nlg:
type: rephrase
  • Audiocodes and Vier CVG channels can be specified in credentials.yml using directly their channel name:
audiocodes:
token: "sample_token"

vier_cvg:
...
    policies:
- name: EnterpriseSearchPolicy
- name: IntentlessPolicy

Changes to default behaviour

info

With Rasa Pro 3.8, we introduced a couple of changes that rectifies the default behaviour of certain components. We believe these changes align better with the principles of CALM. If you are migrating an assistant built with Rasa Pro 3.7, please ensure you have checked if these changes affect your assistant.

Prompt Rendering

Rasa Pro 3.8 introduces a new feature flow-retrieval which ensures that only the flows that are relevant to the conversation context are included in the prompt sent to the LLM in the LLMCommandGenerator. This helps the assistant scale to a higher number of flows and also reduces the LLM costs.

This feature is enabled by default and we recommend to use it if the assistant has more than 40 flows. By default, the feature uses embedding models from OpenAI, but if you are using a different provider (for e.g. Azure), please ensure -

  1. An embedding model is configured with the provider.
  2. LLMCommandGenerator has been configured correctly to connect to the embedding provider. For example, see the section on configuration required to connect to Azure OpenAI service

If you wish to disable the feature you can configure the LLMCommandGenerator as:

config.yml
pipeline:
- name: SingleStepLLMCommandGenerator
...
flow_retrieval:
active: false
...

Processing Chitchat

The default behaviour in Rasa Pro 3.7 to handle chitchat utterances was to rely on free form generative responses. This can lead to the assistant sending unwanted responses or responding to out of scope user utterances. The new default behaviour in Rasa Pro 3.8 is to rely on IntentlessPolicy to respond to chitchat utterances using pre-defined responses only.

If you were relying on free form generative responses to handle chitchat in Rasa Pro 3.7, you will now see a warning message when you train the same assistant with Rasa Pro 3.8 - " pattern_chitchat has an action step with action_trigger_chitchat, but IntentlessPolicy is not configured". This appears because the default definition of pattern_chitchat has been modified in Rasa Pro 3.8 to:

pattern_chitchat:
description: handle interactions with the user that are not task-oriented
name: pattern chitchat
steps:
- action: action_trigger_chitchat

For the assistant to be able to handle chitchat utterances, you have two options:

  1. If you are happy with free-form generative responses for such user utterances, then you can override pattern_chitchat to:

    pattern_chitchat:
    description: handle interactions with the user that are not task-oriented
    name: pattern chitchat
    steps:
    - action: utter_free_chitchat_response
  2. If you want to switch to using pre-defined responses, you should first add IntentlessPolicy to the policies section of the config -

    policies:
    - name: IntentlessPolicy

Next, you should add response templates for the pre-defined responses you want the assistant to consider when responding to a chitchat user utterance.

Handling of categorical slots

Rasa Pro versions <= 3.7.8 used to store the value of a categorical slot in the same casing as it was either specified in the user message or predicted by the LLM in a SetSlot command. This wasn't necessarily same as the casing used in the corresponding possible value defined for that slot in the domain. For e.g, if the categorical slot was defined to have [A, B, C] as the possible values and the prediction was to set it to a then the slot would be set to a. This lead to problems downstream when that slot had to be used in other primitives i.e. flows or custom action.

Rasa Pro 3.7.9 fixes this by always storing the slot value in the same casing as defined in the domain. So, in the above example, the slot would now be stored as A instead of a. This ensures that the user is writing business logic for slot comparisons, for e.g. if conditions in flows, using the same casing as defined by them in the domain.

If you are migrating from Rasa pro versions <= 3.7.8, please double check your flows and custom actions to make sure none of them break because of this change.

Update default signature of LLM calls

In Rasa Pro >= 3.8 we switched from doing synchronous LLM calls to asynchronous calls. We updated all components that use an LLM, e.g.

  • LLMCommandGenerator
  • ContextualResponseRephraser
  • EnterpriseSearchPolicy
  • IntentlessPolicy

This can potentially break assistants migrating to 3.8 that have sub-classed one of these components in their own custom components.

For example, the method predict_commands in the LLMCommandGenerator is now async and needs to await the methods _generate_action_list_using_llm and flow_retrieval.filter_flows as these methods are also async. For more information on asyncio please check their documentation.

Dependency Upgrades

We've updated our core dependencies to enhance functionality and performance across our platform.

Spacy 3.7.x

Upgraded from >=3.6 to >=3.7.

We have transitioned to using Spacy version 3.7.x to benefit from the latest enhancements in natural language processing. If you're using any spacy models with your assistant, please update them to Spacy 3.7.x compatible models.

Pydantic 2.x

Upgraded from >=1.10.9,<1.10.10 to ^2.0.

Along with the Spacy upgrade, we have moved to Pydantic version 2.x, which necessitates updates to Pydantic models. For assistance with updating your models, please refer to the Pydantic Migration Guide. This ensures compatibility with the latest improvements in data validation and settings management.

Rasa Pro 3.7.9 to Rasa Pro 3.7.10

Poetry Installation

Starting from Rasa Pro 3.7.10 in the 3.7.x minor series, we have upgraded the version Poetry for managing dependencies in the Rasa Pro Python package to 1.8.2. To install Rasa Pro in your project, you must first upgrade Poetry to version 1.8.2:

poetry self update 1.8.2

Rasa Pro 3.7.8 to Rasa Pro 3.7.9

Changes to default behaviour

Handling of categorical slots

Rasa Pro versions <= 3.7.8 used to store the value of a categorical slot in the same casing as it was either specified in the user message or predicted by the LLM in a SetSlot command. This wasn't necessarily same as the casing used in the corresponding possible value defined for that slot in the domain. For e.g, if the categorical slot was defined to have [A, B, C] as the possible values and the prediction was to set it to a then the slot would be set to a. This lead to problems downstream when that slot had to be used in other primitives i.e. flows or custom action.

Rasa Pro 3.7.9 fixes this by always storing the slot value in the same casing as defined in the domain. So, in the above example, the slot would now be stored as A instead of a. This ensures that the user is writing business logic for slot comparisons, for e.g. if conditions in flows, using the same casing as defined by them in the domain.

If you are migrating from Rasa pro versions <= 3.7.8, please double check your flows and custom actions to make sure none of them break because of this change.

Rasa 3.6 to Rasa Pro 3.7

Installation

info

Starting from Rasa 3.7.0, Rasa has moved to a new package registry and Docker registry. You will need to update your package registry to install Rasa 3.7.0 and later versions. If you are a Rasa customer, please reach out to your Rasa account manager or support obtain a license.

Python package

Rasa python package for 3.7.0 has been moved to python package registry.

https://europe-west3-python.pkg.dev/rasa-releases/rasa-plus-py

Name of the package is rasa.

Example of how to install the package:

pip install  --extra-index-url=https://europe-west3-python.pkg.dev/rasa-releases/rasa-plus-py/simple rasa==3.7.0

For more information on how to install Rasa Pro, please refer to the Python installation guide.

Helm Chart / Docker Image

Rasa docker image for 3.7.0 is located at:

europe-west3-docker.pkg.dev/rasa-releases/rasa-docker/rasa

Example how to pull the image:

docker pull europe-west3-docker.pkg.dev/rasa-releases/rasa-docker/rasa:3.7.0

For more information on how to install Rasa Pro Docker image, please refer to the Docker installation guide.

Migrating from older versions

For migrating from Rasa Open Source versions, please refer to the migration guide.