LLM Configuration
Instructions on how to setup and configure Large Language Models from OpenAI, Azure, and other providers. Here you'll learn what you need to configure to make LLMs work from different providers for your use case.
Overview
This page applies to the following components which use LLMs:
- SingleStepLLMCommandGenerator
- MultiStepLLMCommandGenerator
- EnterpriseSearchPolicy
- IntentlessPolicy
- ContextualResponseRephraser
- LLMBasedRouter
All the above components can be configured to change:
- the LLM provider
- the model to be used
Starting with version Rasa Pro 3.10
, CALM uses LiteLLM under the hood to integrate
with different LLM providers. Hence, all LiteLLM's integrated providers
are supported with CALM as well. We explicitly mention the settings required for the most frequently used ones in the
sections below.
warning
If you want to try a provider other than OpenAI / Azure OpenAI, it is recommended to install Rasa Pro versions >= 3.10
.
Recommended Models
The table below documents the versions of each model we recommend for use with various Rasa components. As new models are published, Rasa will test these and where appropriate add them as a recommended model.
Component | Providing platform | Recommended models |
---|---|---|
SingleStepLLMCommandGenerator , EnterpriseSearchPolicy , IntentlessPolicy | OpenAI, Azure | gpt-4-0613 |
ContextualResponseRephraser | OpenAI, Azure | gpt-4-0613 , gpt-3.5-turbo-0125 |
MultiStepLLMCommandGenerator | OpenAI, Azure | gpt-4-turbo-2024-04-09 , gpt-3.5-turbo-0125 , gpt-3.5-turbo-1106 , gpt-4o-2024-08-06 |
Chat completion models
Default Provider
CALM is LLM agnostic and can be configured with different LLMs, but OpenAI is the default model provider. Majority of our experiments have been with models available on OpenAI or OpenAI Azure service. The performance of your assistant may vary when using other LLMs, but improvements can be made by tuning flow and collect step descriptions.
To configure components that use a chat completion model as the LLM, declare the configuration under the llm
key of that component's configuration. For example:
Required Parameters
There are certain required parameters under the llm
key:
model
- Specifies the name of the model identifier available from the LLM provider's documentation, for e.g.gpt-4
provider
- Unique identifier of the provider to be used for invoking the specified model.
Optional Parameters
The llm
key also accepts inference time parameters like
temperature
, etc which are optional but can be useful in extracting the best performance out of the model being used.
Please refer to the
official LiteLLM documentation for a list of such parameters supported.
When configuring a particular provider, there are a few provider specific settings which are explained under each provider's individual sub-section below.
important
If you switch to a different LLM provider, all default parameters for the old provider will be overriden with the default parameters of the new provider.
E.g. If a provider sets temperature=0.7
as the default value and you switch to a different LLM
provider, this default will be ignored and it is up to you to set the
temperature for the new provider.
OpenAI
API Token
The API token authenticates your requests to the OpenAI API.
To configure the API token, follow these steps:
If you haven't already, sign up for an account on the OpenAI platform.
Navigate to the OpenAI Key Management page, and click on the "Create New Secret Key" button to initiate the process of obtaining
<your-api-key>
.To set the API key as an environment variable, you can use the following command in a terminal or command prompt:
- Linux/MacOS
- Windows
Replace <your-api-key>
with the actual API key you obtained from the OpenAI platform.
Configuration
There are no additional OpenAI specific parameters to be configured. However, there could be model specific parameters
like temperature
that you might want to modify. Names for such parameters can found in
OpenAI's API documentation and defined under llm
key of the
component's configuration.
Please refer to
LiteLLM's documentation to know the
list of models supported from the OpenAI platform.
Model deprecations
OpenAI regularly publishes a deprecation schedule for its models. This schedule can be accessed in the documentation published by OpenAI.
Azure OpenAI Service
API Token
The API token authenticates your requests to the Azure OpenAI Service.
Set the API token as an environment variable. You can use the following command in a terminal or command prompt:
- Linux/MacOS
- Windows
Replace <your-api-key>
with the actual API key you obtained from the Azure OpenAI Service platform.
Configuration
To access models provided by Azure OpenAI Service, there are a few additional parameters that need to be configured:
provider
- Set toazure
.api_type
- The type of API to use. This should be set to "azure" to indicate the use of Azure OpenAI Service.api_base
- The URL for your Azure OpenAI instance. An example might look like this:https://my-azure.openai.azure.com/
.api_version
- The API version to use for this operation. This follows the YYYY-MM-DD format and the value should be enclosed in single or double quotes.engine
/deployment_name
- Alias fordeployment
parameter. Name of the deployment on Azure.
Model specific parameters like temperature
can be defined as well. Refer to
OpenAI Azure service's API documentation
for information on available parameter names.
A complete example configuration of the SingleStepLLMCommandGenerator
using Azure OpenAI Service would look like this:
- Rasa Pro <=3.7.x
- 3.8.x<=Rasa Pro<=3.9.x
- Rasa Pro >=3.10.x
A more comprehensive example using the Azure OpenAI service in more CALM components is available here.
Model deprecations
Azure regularly publishes a deprecation schedule for its models that come under the OpenAI Azure Service. This schedule can be accessed in the documentation published by Azure.
Debugging
If you encounter timeout errors, configure request_timeout
parameter to a larger value. The exact value depends on
how your azure instance is configured.
Amazon Bedrock
Requirements:
- Make sure you have
rasa-pro>=3.10.x
installed. - Install
boto3>=1.28.57
. - Set the following environment variables -
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
,AWS_REGION_NAME
. - (Optional) Might have to set
AWS_SESSION_TOKEN
if your organisation mandates the usage of temporary credentials for security.
Once the above steps are complete, edit config.yaml
to use an appropriate model and set provider
to bedrock
:
Model specific parameters like temperature
can be defined as well. Refer to
LiteLLM's documentation
for information on available parameter names
and supported models.
Gemini - Google AI Studio
Requirements:
- Make sure you have
rasa-pro>=3.10.x
installed. - Install python package
google-generativeai
. - Get API Key at https://aistudio.google.com/ .
- Set the API key to an environment variable
GEMINI_API_KEY
.
Once the above steps are complete, edit config.yaml
to use an appropriate model and set provider
to gemini
:
Refer to LiteLLM's documentation to know which additional parameters and models are supported.
HuggingFace Inference Endpoints
Requirements:
- Make sure you have
rasa-pro>=3.10.x
installed. - Set an API Key to the environment variable
HUGGINGFACE_API_KEY
. - Edit
config.yaml
to use an appropriate model, setprovider
tohuggingface
andapi_base
to the base URL of the deployed endpoint:
Self Hosted Model Server
CALM's components can also be configured to work with an open source LLM that is hosted on an open source model server like vLLM(recommended), Ollama or Llama.cpp web server. The only requirement is that the model server should adhere to the OpenAI API format.
Once you have your model server running, configure the CALM assistant's config.yaml:
vLLM
Important to note:
Recommended version of
vllm
to use is0.6.0
.CALM exclusively utilizes the chat completions endpoint of the model server, so it's essential that the model's tokenizer includes a chat template. Models lacking a chat template will not be compatible with CALM.
model
should contain the name of the model supplied to the vllm startup command, for example if your model server is started with:vllm serve meta-llama/CodeLlama-7b-Instruct-hfmodel
should be set tometa-llama/CodeLlama-7b-Instruct-hf
.api_base
should contain the full exposed URL of the model server withv1
attached as suffix to the URL.
Ollama
Once the ollama model server is running, edit the config.yaml
file:
Other Providers
info
If you want to try one of these providers, it is recommended to install Rasa Pro versions >= 3.10
.
Other than the above mentioned providers, we have also tested support for the following providers:
Platform | provider | API-KEY variable |
---|---|---|
Anthropic | anthropic | ANTHROPIC_API_KEY |
Cohere | cohere | COHERE_API_KEY |
Mistral | mistral | MISTRAL_API_KEY |
Together AI | together_ai | TOGETHERAI_API_KEY |
Groq | groq | GROQ_API_KEY |
For each of the above ones, ensure you have set an environment variable named by the value in API-KEY variable
column
to the API key of that platform and set the provider
parameter under llm
key of the component's config to the value
in provider
column.
Embedding models
To configure components that use an embedding model, declare the configuration under the embeddings
key
of that component's configuration. For example:
The embeddings
property needs two mandatory parameters:
model
- Specifies the name of the model identifier available from the LLM provider's documentation, for e.g.text-embedding-ada-002
.provider
- Unique identifier of the provider to be used for invoking the specified model, for e.g.openai
- Rasa Pro <=3.9.x
- Rasa Pro >=3.10.x
When configuring a particular provider, there are a few provider specific settings which are explained under each provider's individual sub-section below.
OpenAI
OpenAI is used as the default embedding model provider. To start using, ensure you have configured an API token as you would do for a chat completion model from OpenAI platform
Configuration
- Rasa Pro <=3.9.x
- Rasa Pro >=3.10.x
Azure OpenAI Service
Ensure you have configured an API token as you would do for a chat completion model for Azure OpenAI Service
Configuration
Configuring an embedding model from Azure OpenAI Service needs values for the same set of parameters that are required for configuring a chat completion model from Azure OpenAI Service
- Rasa Pro <=3.9.x
- Rasa Pro >=3.10.x
Amazon Bedrock
Configuring an embedding model from amazon bedrock needs the same pre-requisites as a chat completion model. Please ensure you have addressed these before proceeding further.
Configuration
Please refer to LiteLLM's documentation on list of supported embedding models from Amazon Bedrock
In-Memory
CALM also provides an option to load lightweight embedding models in-memory without needing them to be exposed over an API. It uses the sentence transformers library under the hood to load and run inference on them.
Configuration
- Rasa Pro <=3.9.x
- Rasa Pro >=3.10.x
model
parameter can take as value either any embedding model repository available on the HuggingFace hub or a path to a local model.model_kwargs
parameter is used to provide load time arguments to the sentence transformer library.encode_kwargs
parameter is used to provide inference time arguments to the sentence transformer library.
Other Providers
Other than the above mentioned providers, we have also tested support for the following providers -
Platform | provider | API-KEY variable |
---|---|---|
Cohere | cohere | COHERE_API_KEY |
Mistral | mistral | MISTRAL_API_KEY |
Voyage AI | voyage | VOYAGE_API_KEY |
For each of the above ones, ensure you have set an environment variable named by the value in API-KEY variable
column
to the API key of that platform and set the provider
parameter under llm
key of the component's config to the value
in provider
column.
FAQ
Does OpenAI use my data to train their models?
No. OpenAI does not use your data to train their models. From their website:
Data submitted through the OpenAI API is not used to train OpenAI models or improve OpenAI's service offering.
Example Configurations
Azure
A comprehensive example which includes:
llm
andembeddings
configuration for components inconfig.yml
:IntentlessPolicy
EnterpriseSearchPolicy
SingleStepLLMCommandGenerator
flow_retrieval
in 3.8.x
llm
configuration for rephrase inendpoints.yml
(ContextualResponseRephraser
)
- Rasa Pro <=3.7.x
- 3.8.x <= Rasa Pro <= 3.9.x
- Rasa Pro >=3.10.x