Version: Latest

LLM Configuration for Rasa Pro >= 3.11

Learn how to set up and configure Large Language Models (LLMs) from providers such as OpenAI, Azure, and more. This guide outlines the necessary steps and the latest configurations to seamlessly integrate LLMs into your workflow, tailored to your specific use case.

LLM Configuration for Rasa Pro 3.10 and below

For Rasa Pro versions 3.10 and below, refer to the LLM Configuration for <=3.10 page.

Overview

This page applies to the following components which use LLMs:

All the above components can be configured to change:

  • the LLM provider
  • the model(s) to be used

Starting with version Rasa Pro 3.10, CALM uses LiteLLM under the hood to integrate with different LLM providers. Hence, all LiteLLM's integrated providers are supported with CALM as well. We explicitly mention the settings required for the most frequently used ones in the sections below.

Declaring LLM deployments

LLM deployments are always declared in groups comprising of 1 or more deployments. The below sections explain how to declare these groups.

Model Groups

Model groups allow you to define multiple models under a single ID which can be accessed by any component. Model groups are defined in the endpoints.yml file under the model_groups key, separating model definitions from individual component configurations. For example:

endpoints.yml
model_groups:
- id: openai-direct # Unique identifier for the model group
models:
...
  • The id key uniquely identifies the model group.
  • The models key lists all model deployments in that group.
  • Each model in the list includes a configuration, explained in the following sections.

Defining a single model group

endpoints.yml
model_groups:
- id: openai-direct # Unique identifier for the model group
models:
- provider: openai
model: gpt-4-0613

Required Parameters

There are certain required parameters for each model group:

  1. provider - Unique identifier of the LLM provider to be used.
  2. model - Specifies the name of the model identifier available from the LLM provider's documentation, for e.g. gpt-4-0613.

Optional Parameters

Each model group also accepts inference time parameters like temperature, etc which are optional but can be useful in extracting the best performance out of the model being used. Please refer to the official LiteLLM documentation for a list of such parameters supported.

When configuring a particular provider, there are a few provider specific settings which are explained under each provider's individual sub-section below.

important

If you switch to a different LLM provider, all default parameters for the old provider will be overriden with the default parameters of the new provider.

E.g. If a provider sets temperature=0.7 as the default value and you switch to a different LLM provider, this default will be ignored and it is up to you to set the temperature for the new provider.

Referencing environment variables in the model configuration

To reference environment variables in the model configuration, you can use the ${} syntax. For example:

endpoints.yml
model_groups:
- id: openai-direct
models:
- provider: openai
model: gpt-4-0613
api_key: ${MY_OPENAI_API_KEY}

In the above example, the api_key parameter references the environment variable MY_OPENAI_API_KEY.

endpoints.yml
model_groups:
- id: my_azure_deployment
models:
- provider: azure
deployment: ${AZURE_DEPLOYMENT_NAME}
api_base: ${AZURE_API_BASE}
api_version: ${AZURE_API_VERSION}
api_key: ${MY_AZURE_API_KEY}
timeout: 7

In the above example, the deployment, api_base, api_version, and api_key parameters reference the environment variables AZURE_DEPLOYMENT_NAME, AZURE_API_BASE, AZURE_API_VERSION, and MY_AZURE_API_KEY respectively. The variables are set in the environment using the export command in Unix-based systems and the setx command in Windows systems. Not setting these variables will result in an error when the assistant is started.

LLM API health check

The model config and the connection to the LLM provider can be validated by setting the LLM_API_HEALTH_CHECK environment variable to true.

export LLM_API_HEALTH_CHECK=true

By default, the variable is set to False. When set to True, all LLM deployments defined will be checked for availability by making a test API request.

Using a model group in a component

Components using an LLM can be configured to use any of the declared model groups in the component's configuration. To use a model group, you can specify the model_group key under the llm key. For example:

config.yml
recipe: default.v1
language: en
pipeline:
- name: SingleStepLLMCommandGenerator
llm:
model_group: openai-direct

Defining multiple model groups

endpoints.yml
model_groups:
- id: openai-gpt-4
models:
- provider: openai
model: gpt-4-0613
- id: openai-gpt-35-turbo
models:
- provider: openai
model: gpt-3.5-turbo

The examples above illustrate how to define a model groups consisting of a single model deployment. In order to handle a larger volume of conversations, it is recommended to include multiple model deployments within a model group. To do so you can add additional deployments to the models list as explained in the Multi-LLM routing page.

Using different model groups in different components

config.yml
recipe: default.v1
pipeline:
- name: LLMBasedRouter
calm_entry:
sticky: ...
nlu_entry:
sticky: ...
non_sticky: ...
llm:
model_group: openai-gpt-35-turbo
- name: SingleStepLLMCommandGenerator
llm:
model_group: openai-gpt-4

Multiple components can rely on the same model group and a single component can use multiple models defined in a model group, via the LLM router.

Chat completion models

Default Provider

CALM is LLM agnostic and can be configured with different LLMs, but OpenAI is the default model provider. Majority of our experiments have been with models available on OpenAI or OpenAI Azure service. The performance of your assistant may vary when using other LLMs, but improvements can be made by tuning flow and collect step descriptions.

OpenAI

API Token

The API token authenticates your requests to the OpenAI API.

To configure the API token, follow these steps:

  1. If you haven't already, sign up for an account on the OpenAI platform.

  2. Navigate to the OpenAI Key Management page, and click on the "Create New Secret Key" button to initiate the process of obtaining <your-api-key>.

  3. The API key can be set in the model configuration or through an environment variable.

To set the API key in the config, you can use the api_key parameter in the model configuration:

model_groups:
- id: openai-direct
models:
- provider: openai
model: gpt-4-0613
api_key: ${MY_OPENAI_API_KEY}
info

The api_key parameter can be set in the model configuration for each model in the model_groups section of the endpoints.yml file. For security reasons, the value of the api_key must reference an environment variable, as demonstrated above. This approach ensures sensitive information is securely stored. Directly assigning the API key in the configuration file is not allowed, as it could potentially expose the key to unauthorized access.

To set the API key as an environment variable, you can use the following command in a terminal or command prompt:

export OPENAI_API_KEY=<your-api-key>

Replace <your-api-key> with the actual API key you obtained from the OpenAI platform.

Configuration

There are no additional OpenAI specific parameters to be configured. However, there could be model specific parameters like temperature that you might want to modify. Names for such parameters can found in OpenAI's API documentation and defined under llm key of the component's configuration. Please refer to LiteLLM's documentation to know the list of models supported from the OpenAI platform.

Model deprecations

OpenAI regularly publishes a deprecation schedule for its models. This schedule can be accessed in the documentation published by OpenAI.

Azure OpenAI Service

API Token

The API token authenticates your requests to the Azure OpenAI Service.

Set the API token as an environment variable. You can use the following command in a terminal or command prompt:

export AZURE_API_KEY=<your-api-key>

Replace <your-api-key> with the actual API key you obtained from the Azure OpenAI Service platform.

Configuration

To access models provided by Azure OpenAI Service, there are a few additional parameters that need to be configured:

  • provider - Set to azure.
  • api_base - The URL for your Azure OpenAI instance. An example might look like this: https://my-azure.openai.azure.com/.
  • api_version - The API version to use for this operation. This follows the YYYY-MM-DD format and the value should be enclosed in single or double quotes.
  • deployment - Name of the deployment on Azure.

Model specific parameters like temperature can be defined as well. Refer to OpenAI Azure service's API documentation for information on available parameter names.

A complete example configuration of the SingleStepLLMCommandGenerator using Azure OpenAI Service would look like this:

config.yml
- name: SingleStepLLMCommandGenerator
llm:
model_group: my_azure_deployment
endpoints.yml
model_groups:
- id: my_azure_deployment
models:
- provider: azure
deployment: rasa-gpt-4
api_base: https://my-azure.openai.azure.com/
api_version: "2024-02-15-preview"
api_key: ${MY_AZURE_API_KEY}
timeout: 7
export MY_AZURE_API_KEY=<your-api-key>

A more comprehensive example using the Azure OpenAI service in more CALM components is available here.

Model deprecations

Azure regularly publishes a deprecation schedule for its models that come under the OpenAI Azure Service. This schedule can be accessed in the documentation published by Azure.

Debugging

If you encounter timeout errors, configure timeout parameter to a larger value. The exact value depends on how your azure instance is configured.

Amazon Bedrock

Requirements:

  1. Make sure you have rasa-pro>=3.11.x installed.
  2. Install boto3>=1.28.57.
  3. Set the following as environment variables - AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION_NAME or set them in the environment and reference them in the model configuration.
  4. (Optional) Might have to set AWS_SESSION_TOKEN if your organisation mandates the usage of temporary credentials for security either as an environment variable or in the config.yml file.

Once the above steps are complete, edit config.yaml to use an appropriate model_group from the endpoints.yaml file:

config.yml
- name: SingleStepLLMCommandGenerator
llm:
model_group: amazon_bedrock # Model group ID
endpoints.yml
model_groups:
- id: amazon_bedrock
models:
- provider: bedrock
model: anthropic.claude-instant-v1

Set provider to bedrock and model to the model name you want to use.

Model specific parameters like temperature can be defined as well. Refer to LiteLLM's documentation for information on available parameter names and supported models.

Gemini - Google AI Studio

Requirements:

  1. Make sure you have rasa-pro>=3.11.x installed.
  2. Install python package google-generativeai.
  3. Get API Key at https://aistudio.google.com/ .
  4. Set the API key to an environment variable GEMINI_API_KEY or set it in the model configuration.

Once the above steps are complete, edit config.yaml to use an appropriate model_group from the endpoints.yaml and set provider to gemini:

config.yml
- name: SingleStepLLMCommandGenerator
llm:
model_group: gemini_llm # Model group ID
endpoints.yml
model_groups:
- id: gemini_llm
models:
- provider: gemini
model: gemini-pro
# api_key: ${MY_GEMINI_API_KEY} # Optional, if you want to set the API key in the model configuration.

Refer to LiteLLM's documentation to know which additional parameters and models are supported.

HuggingFace Inference Endpoints

Requirements:

  1. Make sure you have rasa-pro>=3.11.x installed.
  2. Set an API Key to the environment variable HUGGINGFACE_API_KEY or set it in the model configuration.
  3. Edit config.yaml to use an appropriate model_group from the endpoints.yaml, set provider to huggingface and api_base to the base URL of the deployed endpoint:
config.yml
- name: SingleStepLLMCommandGenerator
llm:
model_group: huggingface_llm # Model group ID
endpoints.yml
model_groups:
- id: huggingface_llm
models:
- provider: huggingface
model: meta-llama/CodeLlama-7b-Instruct-hf
api_base: "https://my-endpoint.huggingface.cloud"
# api_key: ${MY_HUGGINGFACE_API_KEY} # Optional, if you want to set the API key in the model configuration.

Self Hosted Model Server

CALM's components can also be configured to work with an open source LLM that is hosted on an open source model server like vLLM(recommended), Ollama or Llama.cpp web server. The only requirement is that the model server should adhere to the OpenAI API format.

Once you have your model server running, configure the CALM assistant's config.yaml file to use a model_group from the endpoints.yaml file:

vLLM

config.yml
- name: SingleStepLLMCommandGenerator
llm:
model_group: self_hosted_llm # Model group ID
endpoints.yml
model_groups:
- id: self_hosted_llm
models:
- provider: self-hosted
model: meta-llama/CodeLlama-7b-Instruct-hf
api_base: "https://my-endpoint/v1"
# api_key: ${HOSTED_VLLM_API_KEY} # Optional, if you want to set the API key in the model configuration.

Important to note:

  1. Recommended version of vllm to use is 0.6.0.

  2. CALM exclusively utilizes the chat completions endpoint of the model server, so it's essential that the model's tokenizer includes a chat template. Models lacking a chat template will have to set the use_chat_completions_endpoint parameter to false in the model_groups configuration.

    config.yml
    - name: SingleStepLLMCommandGenerator
    llm:
    model_group: self_hosted_llm # Model group ID
    endpoints.yml
    model_groups:
    - id: self_hosted_llm
    models:
    - provider: self-hosted
    model: meta-llama/CodeLlama-7b-Instruct-hf
    api_base: "https://my-endpoint/v1"
    use_chat_completions_endpoint: false
  3. model should contain the name of the model supplied to the vllm startup command, for example if your model server is started with:

    vllm serve meta-llama/CodeLlama-7b-Instruct-hf

    model should be set to meta-llama/CodeLlama-7b-Instruct-hf.

  4. api_base should contain the full exposed URL of the model server with v1 attached as suffix to the URL.

  5. If required, Set an API Key to the environment variable HOSTED_VLLM_API_KEY or set it in the model configuration.

Ollama

Once the ollama model server is running, edit the config.yaml file to use a model_group from the endpoints.yaml file:

config.yml
- name: SingleStepLLMCommandGenerator
llm:
model_group: ollama_llm # Model group ID
endpoints.yml
model_groups:
- id: ollama_llm
models:
- provider: ollama
model: llama3.1
api_base: "https://my-endpoint"
# api_key: ${OLLAMA_API_KEY} # Optional, if you want to set the API key in the model configuration.

Other Providers

info

If you want to try one of these providers, it is recommended to install Rasa Pro versions >= 3.11.

Other than the above mentioned providers, we have also tested support for the following providers:

PlatformproviderAPI-KEY variable
AnthropicanthropicANTHROPIC_API_KEY
CoherecohereCOHERE_API_KEY
MistralmistralMISTRAL_API_KEY
Together AItogether_aiTOGETHERAI_API_KEY
GroqgroqGROQ_API_KEY

For each of the above ones, ensure you have set an environment variable named by the value in API-KEY variable column to the API key of that platform or set it in the model configuration, and set the provider parameter under llm key of the component's config to the value in provider column.

Embedding models

To configure components that use an embedding model, reference the model_group from endpoints.yaml under the embeddings key

config.yml
pipeline:
- name: SingleStepLLMCommandGenerator
llm:
model: gpt_llm
flow_retrieval:
embeddings:
model_group: text_embedding_model
endpoints.yml
model_groups:
- id: text_embedding_model
models:
- provider: openai
model: text-embedding-ada-002
# api_key: ${OPENAI_API_KEY} # Optional, if you want to set the API key in the model configuration.

The embeddings property needs the model_group key to be configured in the config.yml file and the corresponding model group should be defined in the endpoints.yml file.

The models key under model_groups in endpoints.yaml should contain the following parameters:

  1. model - Specifies the name of the model identifier available from the LLM provider's documentation, for e.g. text-embedding-ada-002.

  2. provider - Unique identifier of the provider to be used for invoking the specified model, for e.g. openai

When configuring a particular provider, there are a few provider specific settings which are explained under each provider's individual sub-section below.

OpenAI

OpenAI is used as the default embedding model provider. To start using, ensure you have configured an API token as you would do for a chat completion model from OpenAI platform

Configuration

config.yml
pipeline:
- name: SingleStepLLMCommandGenerator
llm:
model_group: openai_llm
flow_retrieval:
embeddings:
model_group: text_embedding_model
endpoints.yml
model_groups:
- id: openai_llm
models:
- provider: openai
model: gpt-4-0613
# api_key: ${OPENAI_API_KEY_1} # Optional, if you want to set the API key in the model configuration.
- id: text_embedding_model
models:
- provider: openai
model: text-embedding-ada-002
# api_key: ${OPENAI_API_KEY_2} # Optional, if you want to set the API key in the model configuration.

Azure OpenAI Service

Ensure you have configured an API token as you would do for a chat completion model for Azure OpenAI Service

Configuration

Configuring an embedding model from Azure OpenAI Service needs values for the same set of parameters that are required for configuring a chat completion model from Azure OpenAI Service

config.yml
pipeline:
- name: SingleStepLLMCommandGenerator
llm:
model_group: openai_direct
flow_retrieval:
embeddings:
model_group: azure_embedding_model
endpoints.yml
model_groups:
- id: openai_direct
models:
- provider: openai
model: gpt-4-0613
# api_key: ${OPENAI_API_KEY} # Optional, if you want to set the API key in the model configuration.
- id: azure_embedding_model
models:
- provider: azure
deployment: test-embeddings
api_base: https://my-azure.openai.azure.com/
api_version: "2024-02-15-preview"
timeout: 7
# api_key: ${AZURE_API_KEY} # Optional, if you want to set the API key in the model configuration.
info

From rasa pro 3.11, deployments from multiple azure subscriptions can be used in the model configurations.

config.yml
pipeline:
- name: SingleStepLLMCommandGenerator
llm:
model_group: azure_llm
flow_retrieval:
embeddings:
model_group: azure_embedding_model
endpoints.yml
model_groups:
- id: azure_llm
models:
- provider: azure
deployment: rasa-gpt-4
api_base: https://azure-server1.com/
api_version: "2024-02-15-preview"
api_key: ${AZURE_API_KEY_1}
- id: azure_embedding_model
models:
- provider: azure
deployment: test-embeddings
api_base: https://azure-server2.com/
api_version: "2024-02-15-preview"
api_key: ${AZURE_API_KEY_1}

Amazon Bedrock

Configuring an embedding model from amazon bedrock needs the same pre-requisites as a chat completion model. Please ensure you have addressed these before proceeding further.

Configuration

config.yml
pipeline:
- name: SingleStepLLMCommandGenerator
llm:
model_group: openai_llm
flow_retrieval:
embeddings:
model_group: amazon_bedrock_model
endpoints.yml
model_groups:
- id: openai_llm
models:
- provider: openai
model: gpt-4-0613
# api_key: ${OPENAI_API_KEY} # Optional, if you want to set the API key in the model configuration.
- id: amazon_bedrock_model
models:
- provider: bedrock
model: amazon.titan-embed-text-v1
# aws_access_key_id: ${MY_AWS_ACCESS_KEY_ID} # Optional, if you want to set the AWS access key in the model configuration.
# aws_secret_access_key: ${MY_AWS_SECRET_ACCESS_KEY} # Optional, if you want to set the AWS secret access key in the model configuration.
# aws_region_name: eu-central-1 # Optional, if you want to set the AWS region name in the model configuration.
# aws_session_token: ${MY_AWS_SESSION_TOKEN} # Optional, if you want to set the AWS session token in the model configuration.

Please refer to LiteLLM's documentation on list of supported embedding models from Amazon Bedrock

In-Memory

CALM also provides an option to load lightweight embedding models in-memory without needing them to be exposed over an API. It uses the sentence transformers library under the hood to load and run inference on them.

Configuration

config.yml
pipeline:
- name: SingleStepLLMCommandGenerator
llm:
model_group: openai_direct
flow_retrieval:
embeddings:
model_group: huggingface_embedding_model
endpoints.yml
model_groups:
- id: openai_direct
models:
- provider: openai
model: gpt-4-0613
# api_key: ${OPENAI_API_KEY} # Optional, if you want to set the API key in the model configuration.
- id: huggingface_embedding_model
models:
- provider: huggingface
model: BAAI/bge-small-en-v1.5
model_kwargs: # used during instantiation
device: "cpu"
encode_kwargs: # used during inference
normalize_embeddings: true
  • model parameter can take as value either any embedding model repository available on the HuggingFace hub or a path to a local model.
  • model_kwargs parameter is used to provide load time arguments to the sentence transformer library.
  • encode_kwargs parameter is used to provide inference time arguments to the sentence transformer library.

Other Providers

Other than the above mentioned providers, we have also tested support for the following providers -

PlatformproviderAPI-KEY variable
CoherecohereCOHERE_API_KEY
MistralmistralMISTRAL_API_KEY
Voyage AIvoyageVOYAGE_API_KEY

For each of the above ones, ensure you have set an environment variable named by the value in API-KEY variable column to the API key of that platform or set it in the model configuration, and set the provider parameter under llm key of the component's config to the value in provider column.

Configuring self-signed SSL certificates

In environments where a proxy performs TLS interception, Rasa may need to be configured to trust the certificates used by your proxy. By default, certificates are loaded from the OS certificate store. However, if your setup involves custom self-signed certificates, you can specify these by setting the RASA_CA_BUNDLE environment variable.

This variable points to the path of the certificate file that Rasa should use to validate SSL connections:

export RASA_CA_BUNDLE="path/to/your/certificate.pem"
info

The REQUESTS_CA_BUNDLE environment variable is deprecated and will no longer be supported in future versions. Please use RASA_CA_BUNDLE instead to ensure compatibility.

Configuring Proxy URLs

In environments where LLM requests need to be routed through a proxy, Rasa relies on LiteLLM to handle proxy configurations. LiteLLM supports configuring proxy URLs through the HTTP_PROXY and HTTPS_PROXY environment variables.

To ensure that all LLM requests are routed through the proxy, you can set the environment variables as follows:

export HTTP_PROXY="http://your-proxy-url:port"
export HTTPS_PROXY="https://your-proxy-url:port"

Another way to configure the proxy is to set the api_base parameter in the model configuration to the proxy URL:

endpoints.yml
model_groups:
- id: self_hosted_llm
models:
- provider: self-hosted
model: meta-llama/CodeLlama-7b-Instruct-hf
api_base: http://your-proxy-url:port

Recommended Models

The table below documents the versions of each model we recommend for use with various Rasa components. As new models are published, Rasa will test these and where appropriate add them as a recommended model.

ComponentProviding platformRecommended models
SingleStepLLMCommandGenerator, EnterpriseSearchPolicy, IntentlessPolicyOpenAI, Azuregpt-4-0613
ContextualResponseRephraserOpenAI, Azuregpt-4-0613, gpt-3.5-turbo-0125
MultiStepLLMCommandGeneratorOpenAI, Azuregpt-4-turbo-2024-04-09, gpt-3.5-turbo-0125, gpt-3.5-turbo-1106, gpt-4o-2024-08-06

FAQ

Does OpenAI use my data to train their models?

No. OpenAI does not use your data to train their models. From their website:

Data submitted through the OpenAI API is not used to train OpenAI models or improve OpenAI's service offering.

Example Configurations

Azure

A comprehensive example which includes:

  • llm and embeddings configuration for components in config.yml:
    • IntentlessPolicy
    • EnterpriseSearchPolicy
    • SingleStepLLMCommandGenerator
    • flow_retrieval in 3.8.x
  • llm configuration for rephrase in endpoints.yml (ContextualResponseRephraser)
endpoints.yml
nlg:
type: rephrase
llm:
model_group: azure_llm
model_groups:
- id: azure_llm
models:
- provider: azure
deployment: rasa-gpt-4
api_base: https://my-azure.openai.azure1.com/
api_version: "2024-02-15-preview"
# api_key: ${MY_AZURE_API_KEY_1} # Optional, if you want to set the API key in the model configuration.
timeout: 7
- id: azure_embeddings
models:
- provider: azure
deployment: rasa-embedding-small
api_base: https://my-azure.openai.azure2.com/
api_version: "2024-02-15-preview"
# api_key: ${MY_AZURE_API_KEY_2} # Optional, if you want to set the API key in the model configuration.
timeout: 7
config.yml
recipe: default.v1
language: en
pipeline:
- name: SingleStepLLMCommandGenerator
llm:
model_group: azure_llm
flow_retrieval:
embeddings:
model_group: azure_embeddings
policies:
- name: FlowPolicy
- name: IntentlessPolicy
llm:
model_group: azure_llm
embeddings:
model_group: azure_embeddings
- name: EnterpriseSearchPolicy
vector_store:
type: "faiss"
threshold: 0.0
llm:
model_group: azure_llm
embeddings:
model_group: azure_embeddings