LLM Configuration for Rasa Pro >= 3.11
Learn how to set up and configure Large Language Models (LLMs) from providers such as OpenAI, Azure, and more. This guide outlines the necessary steps and the latest configurations to seamlessly integrate LLMs into your workflow, tailored to your specific use case.
LLM Configuration for Rasa Pro 3.10 and below
For Rasa Pro versions 3.10
and below, refer to the LLM Configuration for <=3.10
page.
Overview
This page applies to the following components which use LLMs:
- SingleStepLLMCommandGenerator
- MultiStepLLMCommandGenerator
- EnterpriseSearchPolicy
- IntentlessPolicy
- ContextualResponseRephraser
- LLMBasedRouter
All the above components can be configured to change:
- the LLM provider
- the model(s) to be used
Starting with version Rasa Pro 3.10
, CALM uses LiteLLM under the hood to integrate
with different LLM providers. Hence, all LiteLLM's integrated providers
are supported with CALM as well. We explicitly mention the settings required for the most frequently used ones in the
sections below.
Declaring LLM deployments
LLM deployments are always declared in groups comprising of 1 or more deployments. The below sections explain how to declare these groups.
Model Groups
Model groups allow you to define multiple models under a single ID which can be accessed by any component.
Model groups are defined in the endpoints.yml
file under the model_groups
key, separating model definitions from individual component configurations. For example:
- The
id
key uniquely identifies the model group. - The
models
key lists all model deployments in that group. - Each model in the list includes a configuration, explained in the following sections.
Defining a single model group
Required Parameters
There are certain required parameters for each model group:
provider
- Unique identifier of the LLM provider to be used.model
- Specifies the name of the model identifier available from the LLM provider's documentation, for e.g.gpt-4-0613
.
Optional Parameters
Each model group also accepts inference time parameters like temperature
, etc which are optional but can be useful
in extracting the best performance out of the model being used.
Please refer to the
official LiteLLM documentation for a list of such parameters supported.
When configuring a particular provider, there are a few provider specific settings which are explained under each provider's individual sub-section below.
important
If you switch to a different LLM provider, all default parameters for the old provider will be overriden with the default parameters of the new provider.
E.g. If a provider sets temperature=0.7
as the default value and you switch to a different LLM
provider, this default will be ignored and it is up to you to set the
temperature for the new provider.
Referencing environment variables in the model configuration
To reference environment variables in the model configuration, you can use the ${}
syntax. For example:
In the above example, the api_key
parameter references the environment variable MY_OPENAI_API_KEY
.
In the above example, the deployment
, api_base
, api_version
, and api_key
parameters reference the environment variables AZURE_DEPLOYMENT_NAME
, AZURE_API_BASE
, AZURE_API_VERSION
, and MY_AZURE_API_KEY
respectively.
The variables are set in the environment using the export
command in Unix-based systems and the setx
command in Windows systems.
Not setting these variables will result in an error when the assistant is started.
LLM API health check
The model config and the connection to the LLM provider can be validated by setting the
LLM_API_HEALTH_CHECK
environment variable to true
.
By default, the variable is set to False
. When set to True
, all LLM deployments defined will be checked for availability by making a test API request.
Using a model group in a component
Components using an LLM can be configured to use any of the declared model groups in the
component's configuration. To use a model group,
you can specify the model_group
key under the llm
key. For example:
Defining multiple model groups
The examples above illustrate how to define a model groups consisting of a single model deployment. In order to handle a larger
volume of conversations, it is recommended to include multiple model deployments within a model group. To do so you can add additional deployments
to the models
list as explained in the Multi-LLM routing
page.
Using different model groups in different components
Multiple components can rely on the same model group and a single component can use multiple models defined in a model group, via the LLM router.
Chat completion models
Default Provider
CALM is LLM agnostic and can be configured with different LLMs, but OpenAI is the default model provider. Majority of our experiments have been with models available on OpenAI or OpenAI Azure service. The performance of your assistant may vary when using other LLMs, but improvements can be made by tuning flow and collect step descriptions.
OpenAI
API Token
The API token authenticates your requests to the OpenAI API.
To configure the API token, follow these steps:
If you haven't already, sign up for an account on the OpenAI platform.
Navigate to the OpenAI Key Management page, and click on the "Create New Secret Key" button to initiate the process of obtaining
<your-api-key>
.The API key can be set in the model configuration or through an environment variable.
To set the API key in the config, you can use the api_key
parameter in the model configuration:
info
The api_key
parameter can be set in the model configuration for each model in the model_groups
section of the endpoints.yml
file.
For security reasons, the value of the api_key
must reference an environment variable, as demonstrated above.
This approach ensures sensitive information is securely stored.
Directly assigning the API key in the configuration file is not allowed, as it could potentially expose the key to unauthorized access.
To set the API key as an environment variable, you can use the following command in a terminal or command prompt:
- Linux/MacOS
- Windows
Replace <your-api-key>
with the actual API key you obtained from the OpenAI platform.
Configuration
There are no additional OpenAI specific parameters to be configured. However, there could be model specific parameters
like temperature
that you might want to modify. Names for such parameters can found in
OpenAI's API documentation and defined under llm
key of the
component's configuration.
Please refer to
LiteLLM's documentation to know the
list of models supported from the OpenAI platform.
Model deprecations
OpenAI regularly publishes a deprecation schedule for its models. This schedule can be accessed in the documentation published by OpenAI.
Azure OpenAI Service
API Token
The API token authenticates your requests to the Azure OpenAI Service.
Set the API token as an environment variable. You can use the following command in a terminal or command prompt:
- Linux/MacOS
- Windows
Replace <your-api-key>
with the actual API key you obtained from the Azure OpenAI Service platform.
Configuration
To access models provided by Azure OpenAI Service, there are a few additional parameters that need to be configured:
provider
- Set toazure
.api_base
- The URL for your Azure OpenAI instance. An example might look like this:https://my-azure.openai.azure.com/
.api_version
- The API version to use for this operation. This follows the YYYY-MM-DD format and the value should be enclosed in single or double quotes.deployment
- Name of the deployment on Azure.
Model specific parameters like temperature
can be defined as well. Refer to
OpenAI Azure service's API documentation
for information on available parameter names.
A complete example configuration of the SingleStepLLMCommandGenerator
using Azure OpenAI Service would look like this:
- Linux/MacOS
- Windows
A more comprehensive example using the Azure OpenAI service in more CALM components is available here.
Model deprecations
Azure regularly publishes a deprecation schedule for its models that come under the OpenAI Azure Service. This schedule can be accessed in the documentation published by Azure.
Debugging
If you encounter timeout errors, configure timeout
parameter to a larger value. The exact value depends on
how your azure instance is configured.
Amazon Bedrock
Requirements:
- Make sure you have
rasa-pro>=3.11.x
installed. - Install
boto3>=1.28.57
. - Set the following as environment variables -
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
,AWS_REGION_NAME
or set them in the environment and reference them in the model configuration. - (Optional) Might have to set
AWS_SESSION_TOKEN
if your organisation mandates the usage of temporary credentials for security either as an environment variable or in theconfig.yml
file.
Once the above steps are complete, edit config.yaml
to use an appropriate model_group
from the endpoints.yaml
file:
- Secrets set in environment
- Secrets set in configuration
Set provider
to bedrock
and model
to the model name you want to use.
Model specific parameters like temperature
can be defined as well. Refer to
LiteLLM's documentation
for information on available parameter names
and supported models.
Gemini - Google AI Studio
Requirements:
- Make sure you have
rasa-pro>=3.11.x
installed. - Install python package
google-generativeai
. - Get API Key at https://aistudio.google.com/ .
- Set the API key to an environment variable
GEMINI_API_KEY
or set it in the model configuration.
Once the above steps are complete, edit config.yaml
to use an appropriate model_group
from the endpoints.yaml
and set provider
to gemini
:
Refer to LiteLLM's documentation to know which additional parameters and models are supported.
HuggingFace Inference Endpoints
Requirements:
- Make sure you have
rasa-pro>=3.11.x
installed. - Set an API Key to the environment variable
HUGGINGFACE_API_KEY
or set it in the model configuration. - Edit
config.yaml
to use an appropriatemodel_group
from theendpoints.yaml
, setprovider
tohuggingface
andapi_base
to the base URL of the deployed endpoint:
Self Hosted Model Server
CALM's components can also be configured to work with an open source LLM that is hosted on an open source model server like vLLM(recommended), Ollama or Llama.cpp web server. The only requirement is that the model server should adhere to the OpenAI API format.
Once you have your model server running, configure the CALM assistant's config.yaml
file to use a model_group
from the endpoints.yaml
file:
vLLM
Important to note:
Recommended version of
vllm
to use is0.6.0
.CALM exclusively utilizes the chat completions endpoint of the model server, so it's essential that the model's tokenizer includes a chat template. Models lacking a chat template will have to set the
use_chat_completions_endpoint
parameter tofalse
in themodel_groups
configuration.config.yml- name: SingleStepLLMCommandGeneratorllm:model_group: self_hosted_llm # Model group IDendpoints.ymlmodel_groups:- id: self_hosted_llmmodels:- provider: self-hostedmodel: meta-llama/CodeLlama-7b-Instruct-hfapi_base: "https://my-endpoint/v1"use_chat_completions_endpoint: falsemodel
should contain the name of the model supplied to the vllm startup command, for example if your model server is started with:vllm serve meta-llama/CodeLlama-7b-Instruct-hfmodel
should be set tometa-llama/CodeLlama-7b-Instruct-hf
.api_base
should contain the full exposed URL of the model server withv1
attached as suffix to the URL.If required, Set an API Key to the environment variable
HOSTED_VLLM_API_KEY
or set it in the model configuration.
Ollama
Once the ollama model server is running, edit the config.yaml
file to use a model_group from the endpoints.yaml
file:
Other Providers
info
If you want to try one of these providers, it is recommended to install Rasa Pro versions >= 3.11
.
Other than the above mentioned providers, we have also tested support for the following providers:
Platform | provider | API-KEY variable |
---|---|---|
Anthropic | anthropic | ANTHROPIC_API_KEY |
Cohere | cohere | COHERE_API_KEY |
Mistral | mistral | MISTRAL_API_KEY |
Together AI | together_ai | TOGETHERAI_API_KEY |
Groq | groq | GROQ_API_KEY |
For each of the above ones, ensure you have set an environment variable named by the value in API-KEY variable
column
to the API key of that platform or set it in the model configuration, and set the provider
parameter under llm
key
of the component's config to the value in provider
column.
Embedding models
To configure components that use an embedding model, reference the model_group
from endpoints.yaml
under the embeddings
key
The embeddings
property needs the model_group
key to be configured in the config.yml
file and the corresponding model group should be
defined in the endpoints.yml
file.
The models
key under model_groups
in endpoints.yaml
should contain the following parameters:
model
- Specifies the name of the model identifier available from the LLM provider's documentation, for e.g.text-embedding-ada-002
.provider
- Unique identifier of the provider to be used for invoking the specified model, for e.g.openai
When configuring a particular provider, there are a few provider specific settings which are explained under each provider's individual sub-section below.
OpenAI
OpenAI is used as the default embedding model provider. To start using, ensure you have configured an API token as you would do for a chat completion model from OpenAI platform
Configuration
Azure OpenAI Service
Ensure you have configured an API token as you would do for a chat completion model for Azure OpenAI Service
Configuration
Configuring an embedding model from Azure OpenAI Service needs values for the same set of parameters that are required for configuring a chat completion model from Azure OpenAI Service
info
From rasa pro 3.11
, deployments from multiple azure subscriptions can be used in the model configurations.
Amazon Bedrock
Configuring an embedding model from amazon bedrock needs the same pre-requisites as a chat completion model. Please ensure you have addressed these before proceeding further.
Configuration
Please refer to LiteLLM's documentation on list of supported embedding models from Amazon Bedrock
In-Memory
CALM also provides an option to load lightweight embedding models in-memory without needing them to be exposed over an API. It uses the sentence transformers library under the hood to load and run inference on them.
Configuration
model
parameter can take as value either any embedding model repository available on the HuggingFace hub or a path to a local model.model_kwargs
parameter is used to provide load time arguments to the sentence transformer library.encode_kwargs
parameter is used to provide inference time arguments to the sentence transformer library.
Other Providers
Other than the above mentioned providers, we have also tested support for the following providers -
Platform | provider | API-KEY variable |
---|---|---|
Cohere | cohere | COHERE_API_KEY |
Mistral | mistral | MISTRAL_API_KEY |
Voyage AI | voyage | VOYAGE_API_KEY |
For each of the above ones, ensure you have set an environment variable named by the value in API-KEY variable
column
to the API key of that platform or set it in the model configuration, and set the provider
parameter under llm
key
of the component's config to the value in provider
column.
Configuring self-signed SSL certificates
In environments where a proxy performs TLS interception, Rasa may need to be configured to trust the certificates used
by your proxy. By default, certificates are loaded from the OS certificate store. However, if your setup involves
custom self-signed certificates, you can specify these by setting the RASA_CA_BUNDLE
environment variable.
This variable points to the path of the certificate file that Rasa should use to validate SSL connections:
info
The REQUESTS_CA_BUNDLE
environment variable is deprecated and will no longer be supported in future versions. Please
use RASA_CA_BUNDLE instead to ensure compatibility.
Configuring Proxy URLs
In environments where LLM requests need to be routed through a proxy, Rasa relies on LiteLLM to handle proxy
configurations. LiteLLM supports configuring proxy URLs through the HTTP_PROXY
and HTTPS_PROXY
environment
variables.
To ensure that all LLM requests are routed through the proxy, you can set the environment variables as follows:
Another way to configure the proxy is to set the api_base
parameter in the model configuration to the proxy URL:
Recommended Models
The table below documents the versions of each model we recommend for use with various Rasa components. As new models are published, Rasa will test these and where appropriate add them as a recommended model.
Component | Providing platform | Recommended models |
---|---|---|
SingleStepLLMCommandGenerator , EnterpriseSearchPolicy , IntentlessPolicy | OpenAI, Azure | gpt-4-0613 |
ContextualResponseRephraser | OpenAI, Azure | gpt-4-0613 , gpt-3.5-turbo-0125 |
MultiStepLLMCommandGenerator | OpenAI, Azure | gpt-4-turbo-2024-04-09 , gpt-3.5-turbo-0125 , gpt-3.5-turbo-1106 , gpt-4o-2024-08-06 |
FAQ
Does OpenAI use my data to train their models?
No. OpenAI does not use your data to train their models. From their website:
Data submitted through the OpenAI API is not used to train OpenAI models or improve OpenAI's service offering.
Example Configurations
Azure
A comprehensive example which includes:
llm
andembeddings
configuration for components inconfig.yml
:IntentlessPolicy
EnterpriseSearchPolicy
SingleStepLLMCommandGenerator
flow_retrieval
in 3.8.x
llm
configuration for rephrase inendpoints.yml
(ContextualResponseRephraser
)