Version: Latest

LLM Configuration

Instructions on how to setup and configure Large Language Models from OpenAI, Azure, and other providers. Here you'll learn what you need to configure to make LLMs work from different providers for your use case.

Overview

This page applies to the following components which use LLMs:

All the above components can be configured to change:

  • the LLM provider
  • the model to be used

Starting with version Rasa Pro 3.10, CALM uses LiteLLM under the hood to integrate with different LLM providers. Hence, all LiteLLM's integrated providers are supported with CALM as well. We explicitly mention the settings required for the most frequently used ones in the sections below.

warning

If you want to try a provider other than OpenAI / Azure OpenAI, it is recommended to install Rasa Pro versions >= 3.10.

Recommended Models

The table below documents the versions of each model we recommend for use with various Rasa components. As new models are published, Rasa will test these and where appropriate add them as a recommended model.

ComponentProviding platformRecommended models
SingleStepLLMCommandGenerator, EnterpriseSearchPolicy, IntentlessPolicyOpenAI, Azuregpt-4-0613
ContextualResponseRephraserOpenAI, Azuregpt-4-0613, gpt-3.5-turbo-0125
MultiStepLLMCommandGeneratorOpenAI, Azuregpt-4-turbo-2024-04-09, gpt-3.5-turbo-0125, gpt-3.5-turbo-1106, gpt-4o-2024-08-06

Chat completion models

Default Provider

CALM is LLM agnostic and can be configured with different LLMs, but OpenAI is the default model provider. Majority of our experiments have been with models available on OpenAI or OpenAI Azure service. The performance of your assistant may vary when using other LLMs, but improvements can be made by tuning flow and collect step descriptions.

To configure components that use a chat completion model as the LLM, declare the configuration under the llm key of that component's configuration. For example:

config.yml
recipe: default.v1
language: en
pipeline:
- name: SingleStepLLMCommandGenerator
llm:
...

Required Parameters

There are certain required parameters under the llm key:

  1. model - Specifies the name of the model identifier available from the LLM provider's documentation, for e.g. gpt-4
  2. provider - Unique identifier of the provider to be used for invoking the specified model.
config.yaml
recipe: default.v1
language: en
pipeline:
- name: SingleStepLLMCommandGenerator
llm:
model: gpt-4
provider: openai

Optional Parameters

The llm key also accepts inference time parameters like temperature, etc which are optional but can be useful in extracting the best performance out of the model being used. Please refer to the official LiteLLM documentation for a list of such parameters supported.

When configuring a particular provider, there are a few provider specific settings which are explained under each provider's individual sub-section below.

important

If you switch to a different LLM provider, all default parameters for the old provider will be overriden with the default parameters of the new provider.

E.g. If a provider sets temperature=0.7 as the default value and you switch to a different LLM provider, this default will be ignored and it is up to you to set the temperature for the new provider.

OpenAI

API Token

The API token authenticates your requests to the OpenAI API.

To configure the API token, follow these steps:

  1. If you haven't already, sign up for an account on the OpenAI platform.

  2. Navigate to the OpenAI Key Management page, and click on the "Create New Secret Key" button to initiate the process of obtaining <your-api-key>.

  3. To set the API key as an environment variable, you can use the following command in a terminal or command prompt:

export OPENAI_API_KEY=<your-api-key>

Replace <your-api-key> with the actual API key you obtained from the OpenAI platform.

Configuration

There are no additional OpenAI specific parameters to be configured. However, there could be model specific parameters like temperature that you might want to modify. Names for such parameters can found in OpenAI's API documentation and defined under llm key of the component's configuration. Please refer to LiteLLM's documentation to know the list of models supported from the OpenAI platform.

Model deprecations

OpenAI regularly publishes a deprecation schedule for its models. This schedule can be accessed in the documentation published by OpenAI.

Azure OpenAI Service

API Token

The API token authenticates your requests to the Azure OpenAI Service.

Set the API token as an environment variable. You can use the following command in a terminal or command prompt:

export AZURE_API_KEY=<your-api-key>

Replace <your-api-key> with the actual API key you obtained from the Azure OpenAI Service platform.

Configuration

To access models provided by Azure OpenAI Service, there are a few additional parameters that need to be configured:

  • provider - Set to azure.
  • api_type - The type of API to use. This should be set to "azure" to indicate the use of Azure OpenAI Service.
  • api_base - The URL for your Azure OpenAI instance. An example might look like this: https://my-azure.openai.azure.com/.
  • api_version - The API version to use for this operation. This follows the YYYY-MM-DD format and the value should be enclosed in single or double quotes.
  • engine/deployment_name - Alias for deployment parameter. Name of the deployment on Azure.

Model specific parameters like temperature can be defined as well. Refer to OpenAI Azure service's API documentation for information on available parameter names.

A complete example configuration of the SingleStepLLMCommandGenerator using Azure OpenAI Service would look like this:

config.yml
- name: SingleStepLLMCommandGenerator
llm:
provider: azure
deployment: rasa-gpt-4
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: "2024-02-15-preview"
timeout: 7

A more comprehensive example using the Azure OpenAI service in more CALM components is available here.

Model deprecations

Azure regularly publishes a deprecation schedule for its models that come under the OpenAI Azure Service. This schedule can be accessed in the documentation published by Azure.

Debugging

If you encounter timeout errors, configure request_timeout parameter to a larger value. The exact value depends on how your azure instance is configured.

Amazon Bedrock

Requirements:

  1. Make sure you have rasa-pro>=3.10.x installed.
  2. Install boto3>=1.28.57.
  3. Set the following environment variables - AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION_NAME.
  4. (Optional) Might have to set AWS_SESSION_TOKEN if your organisation mandates the usage of temporary credentials for security.

Once the above steps are complete, edit config.yaml to use an appropriate model and set provider to bedrock:

config.yml
- name: SingleStepLLMCommandGenerator
llm:
provider: bedrock
model: anthropic.claude-instant-v1

Model specific parameters like temperature can be defined as well. Refer to LiteLLM's documentation for information on available parameter names and supported models.

Gemini - Google AI Studio

Requirements:

  1. Make sure you have rasa-pro>=3.10.x installed.
  2. Install python package google-generativeai.
  3. Get API Key at https://aistudio.google.com/ .
  4. Set the API key to an environment variable GEMINI_API_KEY.

Once the above steps are complete, edit config.yaml to use an appropriate model and set provider to gemini:

config.yml
- name: SingleStepLLMCommandGenerator
llm:
provider: gemini
model: gemini-pro

Refer to LiteLLM's documentation to know which additional parameters and models are supported.

HuggingFace Inference Endpoints

Requirements:

  1. Make sure you have rasa-pro>=3.10.x installed.
  2. Set an API Key to the environment variable HUGGINGFACE_API_KEY.
  3. Edit config.yaml to use an appropriate model, set provider to huggingface and api_base to the base URL of the deployed endpoint:
config.yml
- name: SingleStepLLMCommandGenerator
llm:
provider: huggingface
model: meta-llama/CodeLlama-7b-Instruct-hf
api_base: "https://my-endpoint.huggingface.cloud"

Self Hosted Model Server

CALM's components can also be configured to work with an open source LLM that is hosted on an open source model server like vLLM(recommended), Ollama or Llama.cpp web server. The only requirement is that the model server should adhere to the OpenAI API format.

Once you have your model server running, configure the CALM assistant's config.yaml:

vLLM

config.yml
- name: SingleStepLLMCommandGenerator
llm:
provider: self-hosted
model: meta-llama/CodeLlama-7b-Instruct-hf
api_base: "https://my-endpoint/v1"

Important to note:

  1. Recommended version of vllm to use is 0.6.0.

  2. CALM exclusively utilizes the chat completions endpoint of the model server, so it's essential that the model's tokenizer includes a chat template. Models lacking a chat template will not be compatible with CALM.

  3. model should contain the name of the model supplied to the vllm startup command, for example if your model server is started with:

    vllm serve meta-llama/CodeLlama-7b-Instruct-hf

    model should be set to meta-llama/CodeLlama-7b-Instruct-hf.

  4. api_base should contain the full exposed URL of the model server with v1 attached as suffix to the URL.

Ollama

Once the ollama model server is running, edit the config.yaml file:

config.yml
- name: SingleStepLLMCommandGenerator
llm:
provider: ollama
model: llama3.1
api_base: "https://my-endpoint"

Other Providers

info

If you want to try one of these providers, it is recommended to install Rasa Pro versions >= 3.10.

Other than the above mentioned providers, we have also tested support for the following providers:

PlatformproviderAPI-KEY variable
AnthropicanthropicANTHROPIC_API_KEY
CoherecohereCOHERE_API_KEY
MistralmistralMISTRAL_API_KEY
Together AItogether_aiTOGETHERAI_API_KEY
GroqgroqGROQ_API_KEY

For each of the above ones, ensure you have set an environment variable named by the value in API-KEY variable column to the API key of that platform and set the provider parameter under llm key of the component's config to the value in provider column.

Embedding models

To configure components that use an embedding model, declare the configuration under the embeddings key of that component's configuration. For example:

config.yml
pipeline:
- name: "SingleStepLLMCommandGenerator"
llm:
model: gpt-4
flow_retrieval:
embeddings:
...

The embeddings property needs two mandatory parameters:

  1. model - Specifies the name of the model identifier available from the LLM provider's documentation, for e.g. text-embedding-ada-002.

  2. provider - Unique identifier of the provider to be used for invoking the specified model, for e.g. openai

config.yml
pipeline:
- name: "SingleStepLLMCommandGenerator"
llm:
provider: openai
model: gpt-4
flow_retrieval:
embeddings:
provider: openai
model: text-embedding-ada-002

When configuring a particular provider, there are a few provider specific settings which are explained under each provider's individual sub-section below.

OpenAI

OpenAI is used as the default embedding model provider. To start using, ensure you have configured an API token as you would do for a chat completion model from OpenAI platform

Configuration

config.yml
pipeline:
- name: "SingleStepLLMCommandGenerator"
llm:
provider: openai
model: gpt-4
flow_retrieval:
embeddings:
provider: openai
model: text-embedding-ada-002

Azure OpenAI Service

Ensure you have configured an API token as you would do for a chat completion model for Azure OpenAI Service

Configuration

Configuring an embedding model from Azure OpenAI Service needs values for the same set of parameters that are required for configuring a chat completion model from Azure OpenAI Service

config.yml
pipeline:
- name: "SingleStepLLMCommandGenerator"
llm:
model: gpt-4
flow_retrieval:
embeddings:
provider: azure
deployment: engine-embed
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: "2024-02-15-preview"
timeout: 7

Amazon Bedrock

Configuring an embedding model from amazon bedrock needs the same pre-requisites as a chat completion model. Please ensure you have addressed these before proceeding further.

Configuration

config.yml
pipeline:
- name: "SingleStepLLMCommandGenerator"
llm:
provider: openai
model: gpt-4
flow_retrieval:
embeddings:
provider: bedrock
model: amazon.titan-embed-text-v1

Please refer to LiteLLM's documentation on list of supported embedding models from Amazon Bedrock

In-Memory

CALM also provides an option to load lightweight embedding models in-memory without needing them to be exposed over an API. It uses the sentence transformers library under the hood to load and run inference on them.

Configuration

config.yml
pipeline:
- name: "SingleStepLLMCommandGenerator"
llm:
provider: "openai"
model: "gpt-4"
flow_retrieval:
embeddings:
provider: "huggingface_local"
model: "BAAI/bge-small-en-v1.5"
model_kwargs: # used during instantiation
device: "cpu"
encode_kwargs: # used during inference
normalize_embeddings: true
  • model parameter can take as value either any embedding model repository available on the HuggingFace hub or a path to a local model.
  • model_kwargs parameter is used to provide load time arguments to the sentence transformer library.
  • encode_kwargs parameter is used to provide inference time arguments to the sentence transformer library.

Other Providers

Other than the above mentioned providers, we have also tested support for the following providers -

PlatformproviderAPI-KEY variable
CoherecohereCOHERE_API_KEY
MistralmistralMISTRAL_API_KEY
Voyage AIvoyageVOYAGE_API_KEY

For each of the above ones, ensure you have set an environment variable named by the value in API-KEY variable column to the API key of that platform and set the provider parameter under llm key of the component's config to the value in provider column.

FAQ

Does OpenAI use my data to train their models?

No. OpenAI does not use your data to train their models. From their website:

Data submitted through the OpenAI API is not used to train OpenAI models or improve OpenAI's service offering.

Example Configurations

Azure

A comprehensive example which includes:

  • llm and embeddings configuration for components in config.yml:
    • IntentlessPolicy
    • EnterpriseSearchPolicy
    • SingleStepLLMCommandGenerator
    • flow_retrieval in 3.8.x
  • llm configuration for rephrase in endpoints.yml (ContextualResponseRephraser)
endpoints.yml
nlg:
type: rephrase
llm:
provider: azure
deployment: rasa-gpt-4
api_type: azure
api_version: "2024-02-15-preview"
api_base: https://my-azure.openai.azure.com
timeout: 7
config.yml
recipe: default.v1
language: en
pipeline:
- name: SingleStepLLMCommandGenerator
llm:
provider: azure
deployment: rasa-gpt-4
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: "2024-02-15-preview"
timeout: 7
flow_retrieval:
embeddings:
provider: openai
deployment: rasa-embedding-small
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: "2024-02-15-preview"
timeout: 7
policies:
- name: FlowPolicy
- name: IntentlessPolicy
llm:
provider: azure
deployment: rasa-gpt-4
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: "2024-02-15-preview"
timeout: 7
embeddings:
provider: azure
deployment: rasa-embedding-small
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: "2024-02-15-preview"
timeout: 7
- name: EnterpriseSearchPolicy
vector_store:
type: "faiss"
threshold: 0.0
llm:
provider: azure
deployment: rasa-gpt-4
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: "2024-02-15-preview"
timeout: 7
embeddings:
provider: azure
deployment: rasa-embedding-small
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: "2024-02-15-preview"
timeout: 7