Starting from Rasa 3.11, direct LLM and embedding configurations inside components are
deprecated. Define all clients in endpoints.yml under the model_groups key. Mixing
approaches can lead to errors and is not supported.
Decoupling LLM and embedding configurations from components#
To decouple configurations, define your LLM and embedding client configurations in
endpoints.yml under the model_groups key. Each model group should have a unique ID
and define its associated models.
endpoints.yml
model_groups:
- id: gpt-4-primary # Unique ID for the LLM deployment
models:
- provider: openai
model: gpt-4
timeout: 7
temperature: 0.0
- id: text-embedding-3-small-primary # Unique ID for the embedding deployment
models:
- provider: openai
model: text-embedding-3-small
Use the model_group key to reference the appropriate model group defined in
endpoints.yml.
config.yml
pipeline:
- name: SingleStepLLMCommandGenerator
llm:
model_group: gpt-4-primary # Reference the model group ID
flow_retrieval:
embeddings:
model_group: text-embedding-3-small-primary # Reference the model group ID
Run the following command to train with the updated configurations:
rasa train --config config.yml --endpoints endpoints.yml
Once trained, the components will reference model configurations from endpoints.yml.
Updating endpoints.yml doesn't require retraining.
warning
Missing model group in endpoints.yml will cause errors. Ensure that all components
referencing a model group (e.g., gpt-4-primary) have a corresponding definition in
endpoints.yml.
Adapting model settings to different environments#
You can configure your assistant to work across multiple environments, such as dev,
staging, and prod, without retraining. Use the ${...} syntax to dynamically set
values from environment variables for keys within the model group.
Supported keys:
api_base
api_version
deployment (specific to Azure OpenAI)
aws_access_key_id (specific to AWS Bedrock)
aws_secret_access_key (specific to AWS Bedrock)
aws_session_token (specific to AWS Bedrock)
aws_region_name (specific to AWS Bedrock)
endpoints.yml
model_groups:
- id: gpt-4-primary # Unique ID for the LLM deployment
models:
- provider: azure
deployment: ${AZURE_DEPLOYMENT_GPT4} # Dynamically set Azure deployment name
api_base: ${AZURE_API_BASE_GPT4} # Dynamically set API base URL
api_key: ${AZURE_API_KEY_GPT4} # Dynamically set API key
- id: text-embedding-3-small-primary # Unique ID for the embedding deployment
models:
- provider: azure
deployment: ${AZURE_DEPLOYMENT_EMBED_SMALL} # Dynamically set Azure deployment name
api_base: ${AZURE_API_BASE_EMBED_SMALL} # Dynamically set API base URL
api_key: ${AZURE_API_KEY_EMBED_SMALL} # Dynamically set API key
Enabling load balancing for multiple model deployments#
To distribute requests across multiple LLM or embedding deployments, update
endpoints.yml as follows:
Define multiple models in a model_group.
Add a router with a routing_strategy to control how requests are distributed. You
can find the available strategies here (TODO_ADD_LINK).
endpoints.yaml
model_groups:
- id: load-balanced-gpt-4 # Unique group ID for load balancing across multiple GPT-4 deployments
models:
# Azure GPT-4 deployment in France
- provider: azure
deployment: azure-deployment-france
api_base: https://api.azure-france.example.com
api_version: 2024-08-01-preview
api_key: ${AZURE_API_KEY_FRANCE}
timeout: 7
temperature: 0.0
# Azure GPT-4 deployment in US
- provider: azure
deployment: azure-deployment-us
api_base: https://api.azure-us.example.com
api_version: 2024-08-01-preview
api_key: ${AZURE_API_KEY_US}
timeout: 7
temperature: 0.0
# OpenAI GPT-4 deployment
- provider: openai
model: gpt-4
api_key: ${OPENAI_API_KEY}
timeout: 7
temperature: 0.0
# Router configuration to distribute requests
router:
routing_strategy: least-busy # Route requests to the least busy deployment
Using separate credentials for multiple deployments from the same provider across components#
When different components need to use separate deployments of the same provider, you can
define individual model groups in endpoints.yml. This allows each deployment to use
its own set credentials, such as API keys, and specify its own API base URLs.
To achieve this, use the ${...} syntax to reference environment variables for
credentials.
For example, a rephraser might use gpt-3.5-turbo deployed in France with one set of
credentials, while a SingleStepLLMCommandGenerator uses gpt-4 deployed in
Switzerland. Another deployment, like text-embedding-3-small, might run on servers in
the US.
endpoints.yml
model_groups:
# Model group with Azure GPT-3.5-TURBO deployment in France used by rephraser
- id: gpt-3.5-rephraser # Unique ID for the gpt-3.5-turbo deployment
# Model group with Azure GPT-4 deployment in Swiss
- id: gpt-4-primary
models:
- provider: openai
deployment: ${AZURE_DEPLOYMENT_GPT_4_SWISS}
api_base: ${AZURE_API_BASE_GPT_4_SWISS}
api_key: ${AZURE_API_KEY_GPT_4_SWISS}
timeout: 7
# Model group with Azure text-embedding-3-small deployment in US
- id: text-embedding-3-small-primary
models:
- provider: openai
deployment: ${AZURE_DEPLOYMENT_EMBED_SMALL_US}
api_base: ${AZURE_API_BASE_EMBED_SMALL_US}
api_key: ${AZURE_API_KEY_EMBED_SMALL_US}
timeout: 7
Reference the appropriate model from component:
config.yml
pipeline:
- name: SingleStepLLMCommandGenerator
llm:
model_group: gpt-4-primary
embeddings:
flow_retrieval:
embeddings: text-embedding-3-small-primary
endpoints.yml
nlg:
- type: rephraser
llm:
model_group: gpt-3.5-turbo-rephraser
Using separate configurations for different environments#
When you need to adapt deployment setups for different environments, such as dev and
prod, you can use separate endpoints.yml files. This approach keeps your
config.yml consistent while adjusting the runtime configuration to meet
environment-specific needs.
For instance, you might use a single deployment in dev for simplicity, while in
prod, you can leverage multiple deployments with routing to handle higher traffic
loads.
Create separate endpoints.yml files for each environment, such as endpoints.dev.yml
and endpoints.prod.yml.
Which endpoints file you use for training the assistant doesn't matter as long as
both endpoints.dev.yml and endpoints.prod.yml define the same model groups with
identical unique IDs. When running the assistant, specify the appropriate endpoints file
for the environment.