Version: Latest

LLM Providers

Instructions on how to setup and configure Large Language Models from OpenAI, Cohere, and other providers. Here you'll learn what you need to configure and how you can customize LLMs to work efficiently with your specific use case.

Rasa Labs

Overview

All Rasa components which make use of an LLM can be configured. This includes:

  • The LLM provider
  • The model
  • The sampling temperature
  • The prompt template

and other settings. This page applies to the following components which use LLMs:

  • LLMCommandGenerator
  • EnterpriseSearchPolicy
  • IntentlessPolicy
  • ContextualResponseRephraser
  • LLMIntentClassifier

OpenAI Configuration

The configuration describes in detail how to connect to OpenAI. Rasa is LLM agnostic and can be configured with different LLMs, but OpenAI is the default.

If you want to configure your assistant with a different LLM, you can find instructions for other LLM providers further down the page.

API Token

The API token authenticates your requests to the OpenAI API.

To configure the API token, follow these steps:

  1. If you haven't already, sign up for an account on the OpenAI platform.

  2. Navigate to the OpenAI Key Management page, and click on the "Create New Secret Key" button to initiate the process of obtaining your API key.

  3. To set the API key as an environment variable, you can use the following command in a terminal or command prompt:

export OPENAI_API_KEY=your-api-key

Replace <your-api-key> with the actual API key you obtained from the OpenAI platform.

Model Configuration

Many LLM providers offer multiple models through their API. The model is specified individually for each component, so that if you want to you can use a combination of various models. For instance here is how you could configure a different model for the LLMCommandGenerator and the EnterpriseSearchPolicy:

config.yml
recipe: default.v1
language: en
pipeline:
- name: LLMCommandGenerator
model: "gpt-4"
policies:
- name: rasa.core.policies.flow_policy.FlowPolicy
- name: EnterpriseSearchPolicy
model: "gpt-3.5-turbo"

Additional Configuration for Azure OpenAI Service

For those using Azure OpenAI Service, there are additional parameters that need to be configured:

  • openai.api_type: This should be set to "azure" to indicate the use of Azure OpenAI Service. You can also set this as the OPENAI_API_TYPE environment variable.
  • openai.api_base: This should be the URL for your Azure OpenAI instance. An example might look like this: "https://my-azure.openai.azure.com/".
  • api_version: The API version to use for this operation. This follows the YYYY-MM-DD format.
  • engine: For the llm part, provide the name of the deployment in the config.yml using engine parameter. If you are using the chat models, you must also create a new openAI deployment on azure OpenAI.
  • deployment: For embeddings part, you must also add deployment parameter to the config.yml. Note that you you must create a deployment on azure for using embeddings with azure OpenAI.
  • chunk_size: Some azure plans might restrict you from sending larger chunks of text for embeddings. If you see an an error that says Too many inputs, you should decrease your chunk_size. By default, chunk_size is 1000 but this can be configured to a lower value under the embeddings portion in the config.yml

To configure these parameters, follow these steps:

Step 1: Configure the openai.api_type either as an environment variable or set it in the config file. To create an environment variable use the following instructions:

export OPENAI_API_TYPE="azure"

To configure the openai.api_type in the config file, add it in the pipeline component like this:

config.yml
- name: LLMCommandGenerator
llm:
model_name: gpt-3.5-turbo
api_type: azure
# additional configuration parameters

Step 2: Configure the openai.api_base either as an environment variable or set it in the config file. To create an environment variable use the following instructions:

export OPENAI_API_BASE=your-azure-openai-instance-url

To configure the openai.api_base in the config file, add it in the pipeline component like this:

config.yml
- name: LLMCommandGenerator
llm:
model_name: gpt-3.5-turbo
api_base: https://my-azure.openai.azure.com/
# additional configuration parameters

Step 3: To configure the api_version in the config file, add it in the pipeline component like this:

config.yml
- name: LLMCommandGenerator
llm:
model_name: gpt-3.5-turbo
api_version: 2024-02-15-preview
# additional configuration parameters

Step 4: To configure the engine in the config file, add it in the pipeline component like this:

config.yml
- name: LLMCommandGenerator
llm:
model_name: gpt-3.5-turbo
engine: <name_of_deployment_on_azure>
# additional configuration parameters

Step 5: To configure the deployment in the config.yml file, add it in the pipeline component like this:

config.yml
- name: LLMIntentClassifier
fallback_intent: "out_of_scope"
embeddings:
model: text-embedding-ada-002
deployment: <name_of_deployment_on_azure>
# additional configuration parameters

Step 6: To configure chunk_size in the config file, add it in the pipeline components under embeddings object like this:

config.yml
- name: LLMIntentClassifier
fallback_intent: "out_of_scope"
embeddings:
model: text-embedding-ada-002
chunk_size: 16
# additional configuration parameters

A complete configuration of the LLMCommandGenerator using Azure OpenAI Service might look, for example, like this:

- name: LLMCommandGenerator
llm:
model_name: gpt-3.5-turbo
api_type: azure
api_base: https://my-azure.openai.azure.com/
request_timeout: 7
api_version: 2024-02-15-preview
engine: rasa-gpt-3.5-turbo

Other LLMs/Embeddings

The LLM and embeddings provider can be configured separately for each component. All components default to using OpenAI.

important

If you switch to a different LLM / embedding provider, you need to go through additional installation and setup. Please note the mentioned additional requirements for each provider in their respective section.

caution

We are currently working on adding support for other LLM providers. We support configuring alternative LLM and embedding providers, but we have tested the functionality with OpenAI only. The performance of your assistant may vary when using other LLMs, but improvements can be made by experimenting with the prompt.

Configuring an LLM provider

The LLM provider can be configured using the llm property of each component. The llm.type property specifies the LLM provider to use.

config.yml
pipeline:
- name: "LLMCommandGenerator"
llm:
type: "cohere"

The above configuration specifies that the LLMCommandGenerator should use the Cohere LLM provider rather than OpenAI.

important

If you switch to a different LLM provider, all default parameters for different components will be ignored and the default for the new provider is used.

E.g. If a component sets temperature=0.7 and you switch to a different LLM provider, this default will be ignored and it is up to you to set the temperature for the new provider.

The following LLM providers are supported:

OpenAI

Default LLM provider. Requires the OPENAI_API_KEY environment variable to be set. The model cam be configured as an optional parameter

llm:
type: "openai"
model_name: "gpt-4"
temperature: 0.7

Cohere

Support for Cohere needs to be installed, e.g. using pip install cohere. Additionally, requires the COHERE_API_KEY environment variable to be set.

llm:
type: "cohere"
model: "command"
temperature: 0.7

Vertex AI

To use Vertex AI you need to install pip install google-cloud-aiplatform The credentials for Vertex AI can be configured as described in the google auth documentation.

llm:
type: "vertexai"
model_name: "text-bison"
temperature: 0.7

Hugging Face Hub

The Hugging Face Hub LLM uses models from Hugging Face. It requires additional packages to be installed: pip install huggingface_hub. The environment variable HUGGINGFACEHUB_API_TOKEN needs to be set to a valid API token.

llm:
type: "huggingface_hub"
repo_id: "HuggingFaceH4/zephyr-7b-beta"
task: "text-generation"

llama-cpp

To use the llama-cpp language model, you should install the required python library pip install llama-cpp-python. A path to the Llama model must be provided. For more details, check out the llama-cpp project.

llm:
type: "llamacpp"
model_path: "/path/to/model.bin"
temperature: 0.7

Other LLM providers

If you want to use a different LLM provider, you can specify the name of the provider in the llm.type property accoring to this mapping.

Configuring an embeddings provider

The embeddings provider can be configured using the embeddings property of each component. The embeddings.type property specifies the embeddings provider to use.

config.yml
pipeline:
- name: "LLMIntentClassifier"
embeddings:
type: "cohere"

The above configuration specifies that the LLMIntentClassifier should use the Cohere embeddings provider rather than OpenAI.

Only Some Components need Embeddings

Not every component uses embeddings. For example, the ContextualResponseRephraser component does not use embeddings. For these components, no embeddings property is needed.

The following embeddings providers are supported:

OpenAI

Default embeddings. Requires the OPENAI_API_KEY environment variable to be set. The model cam be configured as an optional parameter

embeddings:
type: "openai"
model: "text-embedding-ada-002"

Cohere

Embeddings from Cohere. Requires the python package for cohere to be installed, e.g. uing pip install cohere. The COHERE_API_KEY environment variable must be set. The model can be configured as an optional parameter.

embeddings:
type: "cohere"
model: "embed-english-v2.0"

spaCy

The spacy embeddings provider uses en_core_web_sm model to generate embeddings. The model needs to be installed separately, e.g. using python -m spacy download en_core_web_sm.

embeddings:
type: "spacy"

Vertex AI

To use Vertex AI you need to install pip install google-cloud-aiplatform The credentials for Vertex AI can be configured as described in the google auth documentation.

embeddings:
type: "vertexai"
model_name: "textembedding-gecko"

Hugging Face Hub

The Hugging Face Hub embeddings provider uses models from Hugging Face. It requires additional packages to be installed: pip install huggingface_hub. The environment variable HUGGINGFACEHUB_API_TOKEN needs to be set to a valid API token.

embeddings:
type: "huggingface_hub"
repo_id: "sentence-transformers/all-mpnet-base-v2"
task: "feature-extraction"

llama-cpp

To use the llama-cpp embeddings, you should install the required python library pip install llama-cpp-python. A path to the Llama model must be provided. For more details, check out the llama-cpp project.

embeddings:
type: "llamacpp"
model_path: "/path/to/model.bin"

Huggingface

The embedding types huggingface, huggingface_instruct and huggingface_bge can be used to locally run models from Huggingface. They are intended for different kinds of embedding models. For the following models, please refer to the documentation of Sentence Transformers library to see the list of available parameters. Here's how to configure each of these:

  • huggingface: Hugging Face Sentence-Transformer embedding models. As a prerequisite, you should install the sentence_transformers python package.
embeddings:
type: "huggingface"
model_name: "sentence-transformers/all-mpnet-base-v2"
model_kwargs:
device: "cpu"
encode_kwargs:
normalize_embeddings: True
  • huggingface_instruct: Huggingface instruct embedding models. You should have the sentence_transformers and InstructorEmbedding python packages installed.
embeddings:
type: "huggingface_instruct"
model_name: "hkunlp/instructor-large"
model_kwargs:
device: "cpu"
encode_kwargs:
normalize_embeddings: True
  • huggingface_bge: BGE models are currently one of the best open source embedding models (according to the MTEB leaderboards) It requires the installation of sentence_transformers python package.
embeddings:
type: "huggingface_bge"
model_name: "BAAI/bge-small-en-v1.5"
model_kwargs:
device: "cpu"
encode_kwargs:
normalize_embeddings: True

FAQ

Does OpenAI use my data to train their models?

No. OpenAI does not use your data to train their models. From their website:

Data submitted through the OpenAI API is not used to train OpenAI models or improve OpenAI's service offering.