Version: Latest

LLM Providers

Instructions on how to setup and configure Large Language Models from OpenAI, Cohere, and other providers. Here you'll learn what you need to configure and how you can customize LLMs to work efficiently with your specific use case.

Rasa Labs

Overview

All Rasa components which make use of an LLM can be configured. This includes:

  • The LLM provider
  • The model
  • The sampling temperature
  • The prompt template

and other settings. This page applies to the following components which use LLMs:

  • LLMCommandGenerator
  • EnterpriseSearchPolicy
  • IntentlessPolicy
  • ContextualResponseRephraser
  • LLMIntentClassifier

OpenAI Configuration

The configuration describes in detail how to connect to OpenAI. Rasa is LLM agnostic and can be configured with different LLMs, but OpenAI is the default.

If you want to configure your assistant with a different LLM, you can find instructions for other LLM providers further down the page.

API Token

The API token authenticates your requests to the OpenAI API.

To configure the API token, follow these steps:

  1. If you haven't already, sign up for an account on the OpenAI platform.

  2. Navigate to the OpenAI Key Management page, and click on the "Create New Secret Key" button to initiate the process of obtaining your API key.

  3. To set the API key as an environment variable, you can use the following command in a terminal or command prompt:

export OPENAI_API_KEY=your-api-key

Replace <your-api-key> with the actual API key you obtained from the OpenAI platform.

Model Configuration

Many LLM providers offer multiple models through their API. The model is specified individually for each component, so that if you want to you can use a combination of various models. For instance here is how you could configure a different model for the LLMCommandGenerator and the EnterpriseSearchPolicy:

config.yml
recipe: default.v1
language: en
pipeline:
- name: LLMCommandGenerator
model: "gpt-4"
policies:
- name: rasa.core.policies.flow_policy.FlowPolicy
- name: EnterpriseSearchPolicy
model: "gpt-3.5-turbo"

Additional Configuration for Azure OpenAI Service

For those using Azure OpenAI Service, there are additional parameters that need to be configured:

  • api_type - The type of API to use. This should be set to "azure" to indicate the use of Azure OpenAI Service. Can be set through ENV var OPENAI_API_TYPE.
  • api_base - The URL for your Azure OpenAI instance. An example might look like this: https://my-azure.openai.azure.com/. Can be set through ENV var OPENAI_API_BASE.
  • api_version - The API version to use for this operation. This follows the YYYY-MM-DD format. Can be set through ENV var OPENAI_API_VERSION.
  • engine/deployment - Name of the deployment for chat model or embeddings on Azure.
  • chunk_size - Size of text chunk embeddings sent to Azure.

More detailed descriptions of these parameters can be found at the end of this section.

To configure these parameters, follow these steps:

Step 1: Configure the api_type either as an environment variable or set it in the config file. To create an environment variable use the following instructions:

export OPENAI_API_TYPE="azure"

To configure the api_type in the config file, add it in the pipeline component like this:

config.yml
- name: LLMCommandGenerator
llm:
model_name: gpt-3.5-turbo
api_type: azure
# additional configuration parameters

Step 2: Configure the api_base either as an environment variable or set it in the config file. To create an environment variable use the following instructions:

export OPENAI_API_BASE=your-azure-openai-instance-url

To configure the api_base in the config file, add it in the pipeline component like this:

config.yml
- name: LLMCommandGenerator
llm:
model_name: gpt-3.5-turbo
api_base: https://my-azure.openai.azure.com/
# additional configuration parameters

Step 3: To configure the api_version in the config file, add it in the pipeline component like this:

config.yml
- name: LLMCommandGenerator
llm:
model_name: gpt-3.5-turbo
api_version: 2024-02-15-preview
# additional configuration parameters

Step 4: To configure the engine in the config file, add it in the pipeline component like this:

config.yml
- name: LLMCommandGenerator
llm:
model_name: gpt-3.5-turbo
engine: <name_of_deployment_on_azure>
# additional configuration parameters

Step 5: To configure the deployment/engine for embeddings in the config.yml file, add it in the pipeline component like this:

Using engine field:

config.yml
- name: LLMIntentClassifier
fallback_intent: "out_of_scope"
embeddings:
model: text-embedding-ada-002
engine: <name_of_deployment_on_azure>
# additional configuration parameters

Using deployment field:

Please note that you must set openai_api_type to azure in the embeddings configuration to use the deployment field.

config.yml
- name: LLMIntentClassifier
fallback_intent: "out_of_scope"
embeddings:
model: text-embedding-ada-002
deployment: <name_of_deployment_on_azure>
openai_api_type: azure
# additional configuration parameters

Step 6: To configure chunk_size in the config file, add it in the pipeline components under embeddings object like this:

config.yml
- name: LLMIntentClassifier
fallback_intent: "out_of_scope"
embeddings:
model: text-embedding-ada-002
chunk_size: 16
# additional configuration parameters

A complete configuration of the LLMCommandGenerator using Azure OpenAI Service might look, for example, like this:

config.yml
- name: LLMCommandGenerator
llm:
model_name: gpt-4
engine: rasa-gpt-4
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: 2024-02-15-preview
request_timeout: 7
flow_retrieval:
embeddings:
model_name: text-embedding-3-small
engine: rasa-embedding-small
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: 2024-02-15-preview
request_timeout: 7

A more comprehensive example which includes:

  • llm and embeddings configuration for components in config.yml:
    • IntentlessPolicy
    • EnterpriseSearchPolicy
    • LLMCommandGenerator
    • flow_retrieval in 3.8.x
  • llm configuration for rephrase in endpoints.yml (ContextualResponseRephraser)
endpoints.yml
nlg:
type: rephrase
llm:
model_name: gpt-4
engine: rasa-gpt-4
api_type: azure
api_version: 2024-02-15-preview
api_base: https://my-azure.openai.azure.com
request_timeout: 7
config.yml
recipe: default.v1
language: en
pipeline:
- name: LLMCommandGenerator
llm:
model_name: gpt-4
engine: rasa-gpt-4
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: 2024-02-15-preview
request_timeout: 7
flow_retrieval:
embeddings:
model_name: text-embedding-3-small
engine: rasa-embedding-small
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: 2024-02-15-preview
request_timeout: 7
policies:
- name: FlowPolicy
- name: IntentlessPolicy
llm:
model_name: gpt-4
engine: rasa-gpt-4
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: 2024-02-15-preview
request_timeout: 7
embeddings:
model_name: text-embedding-3-small
engine: rasa-embedding-small
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: 2024-02-15-preview
request_timeout: 7
- name: EnterpriseSearchPolicy
vector_store:
type: "faiss"
threshold: 0.0
llm:
model_name: gpt-4
engine: rasa-gpt-4
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: 2024-02-15-preview
request_timeout: 7
embeddings:
model_name: text-embedding-3-small
engine: rasa-embedding-small
api_type: azure
api_base: https://my-azure.openai.azure.com/
api_version: 2024-02-15-preview
request_timeout: 7

Try increasing the request_timeout value if you find langchain consistently hitting a timeout warning. The value to which you should set this parameter may depend on your azure instance.

How to configure the llm and embeddings fields

Azure Open AI Config KeyRasa config sub-section(s)Rasa config sub-section keyDescription
api_typellm and embeddingsapi_typeThe type of API to use. This should be set to "azure" to indicate the use of Azure OpenAI Service.
api_basellm and embeddingsapi_baseThe URL for your Azure OpenAI instance. An example might look like this: https://my-azure.openai.azure.com/.
api_versionllm and embeddingsapi_versionThe API version to use for this operation. See Azure docs for more information about supported API versions.
enginellmengineName of the deployment for chat model on Azure. If you are using the chat models, you must already have an existing OpenAI deployment on Azure OpenAI.
deploymentembeddingsengineName of the deployment for embeddings model on Azure. Note that you must already have an existing embeddings model OpenAI deployment on Azure OpenAI.
chunk_sizellm and embeddingschunk_sizeSize of text chunk embeddings sent to Azure. Some azure plans might restrict you from sending larger chunks of text for embeddings. If you see an an error that says Too many inputs, you should decrease your chunk_size. By default, chunk_size is 1000 but this can be configured to a lower value under the embeddings portion in the config.yml

To use the deployment parameter in the embeddings instead of engine you must set openai_api_type to azure in the embeddings configuration.

Other LLMs/Embeddings

The LLM and embeddings provider can be configured separately for each component. All components default to using OpenAI.

important

If you switch to a different LLM / embedding provider, you need to go through additional installation and setup. Please note the mentioned additional requirements for each provider in their respective section.

caution

We are currently working on adding support for other LLM providers. We support configuring alternative LLM and embedding providers, but we have tested the functionality with OpenAI only. The performance of your assistant may vary when using other LLMs, but improvements can be made by experimenting with the prompt.

Configuring an LLM provider

The LLM provider can be configured using the llm property of each component. The llm.type property specifies the LLM provider to use.

config.yml
pipeline:
- name: "LLMCommandGenerator"
llm:
type: "cohere"

The above configuration specifies that the LLMCommandGenerator should use the Cohere LLM provider rather than OpenAI.

important

If you switch to a different LLM provider, all default parameters for different components will be ignored and the default for the new provider is used.

E.g. If a component sets temperature=0.7 and you switch to a different LLM provider, this default will be ignored and it is up to you to set the temperature for the new provider.

The following LLM providers are supported:

OpenAI

Default LLM provider. Requires the OPENAI_API_KEY environment variable to be set. The model cam be configured as an optional parameter

llm:
type: "openai"
model_name: "gpt-4"
temperature: 0.7

Cohere

Support for Cohere needs to be installed, e.g. using pip install cohere. Additionally, requires the COHERE_API_KEY environment variable to be set.

llm:
type: "cohere"
model: "command"
temperature: 0.7

Vertex AI

To use Vertex AI you need to install pip install google-cloud-aiplatform The credentials for Vertex AI can be configured as described in the google auth documentation.

llm:
type: "vertexai"
model_name: "text-bison"
temperature: 0.7

Hugging Face Hub

The Hugging Face Hub LLM uses models from Hugging Face. It requires additional packages to be installed: pip install huggingface_hub. The environment variable HUGGINGFACEHUB_API_TOKEN needs to be set to a valid API token.

llm:
type: "huggingface_hub"
repo_id: "HuggingFaceH4/zephyr-7b-beta"
task: "text-generation"

llama-cpp

To use the llama-cpp language model, you should install the required python library pip install llama-cpp-python. A path to the Llama model must be provided. For more details, check out the llama-cpp project.

llm:
type: "llamacpp"
model_path: "/path/to/model.bin"
temperature: 0.7

Other LLM providers

If you want to use a different LLM provider, you can specify the name of the provider in the llm.type property accoring to this mapping.

Configuring an embeddings provider

The embeddings provider can be configured using the embeddings property of each component. The embeddings.type property specifies the embeddings provider to use.

config.yml
pipeline:
- name: "LLMIntentClassifier"
embeddings:
type: "cohere"

The above configuration specifies that the LLMIntentClassifier should use the Cohere embeddings provider rather than OpenAI.

Only Some Components need Embeddings

Not every component uses embeddings. For example, the ContextualResponseRephraser component does not use embeddings. For these components, no embeddings property is needed.

The following embeddings providers are supported:

OpenAI

Default embeddings. Requires the OPENAI_API_KEY environment variable to be set. The model cam be configured as an optional parameter

embeddings:
type: "openai"
model: "text-embedding-ada-002"

Cohere

Embeddings from Cohere. Requires the python package for cohere to be installed, e.g. uing pip install cohere. The COHERE_API_KEY environment variable must be set. The model can be configured as an optional parameter.

embeddings:
type: "cohere"
model: "embed-english-v2.0"

spaCy

The spacy embeddings provider uses en_core_web_sm model to generate embeddings. The model needs to be installed separately, e.g. using python -m spacy download en_core_web_sm.

embeddings:
type: "spacy"

Vertex AI

To use Vertex AI you need to install pip install google-cloud-aiplatform The credentials for Vertex AI can be configured as described in the google auth documentation.

embeddings:
type: "vertexai"
model_name: "textembedding-gecko"

Hugging Face Hub

The Hugging Face Hub embeddings provider uses models from Hugging Face. It requires additional packages to be installed: pip install huggingface_hub. The environment variable HUGGINGFACEHUB_API_TOKEN needs to be set to a valid API token.

embeddings:
type: "huggingface_hub"
repo_id: "sentence-transformers/all-mpnet-base-v2"
task: "feature-extraction"

llama-cpp

To use the llama-cpp embeddings, you should install the required python library pip install llama-cpp-python. A path to the Llama model must be provided. For more details, check out the llama-cpp project.

embeddings:
type: "llamacpp"
model_path: "/path/to/model.bin"

Huggingface

The embedding types huggingface, huggingface_instruct and huggingface_bge can be used to locally run models from Huggingface. They are intended for different kinds of embedding models. For the following models, please refer to the documentation of Sentence Transformers library to see the list of available parameters. Here's how to configure each of these:

  • huggingface: Hugging Face Sentence-Transformer embedding models. As a prerequisite, you should install the sentence_transformers python package.
embeddings:
type: "huggingface"
model_name: "sentence-transformers/all-mpnet-base-v2"
model_kwargs:
device: "cpu"
encode_kwargs:
normalize_embeddings: True
  • huggingface_instruct: Huggingface instruct embedding models. You should have the sentence_transformers and InstructorEmbedding python packages installed.
embeddings:
type: "huggingface_instruct"
model_name: "hkunlp/instructor-large"
model_kwargs:
device: "cpu"
encode_kwargs:
normalize_embeddings: True
  • huggingface_bge: BGE models are currently one of the best open source embedding models (according to the MTEB leaderboards) It requires the installation of sentence_transformers python package.
embeddings:
type: "huggingface_bge"
model_name: "BAAI/bge-small-en-v1.5"
model_kwargs:
device: "cpu"
encode_kwargs:
normalize_embeddings: True

FAQ

Does OpenAI use my data to train their models?

No. OpenAI does not use your data to train their models. From their website:

Data submitted through the OpenAI API is not used to train OpenAI models or improve OpenAI's service offering.