Configuring Your Assistant
You can customise many aspects of how your assistant project works by modifying the following files: config.yml
, endpoints.yml
, and domain.yml
.
Configuration File
The config.yml
file defines how your Rasa assistant processes user messages. It specifies which components, policies, and language settings your assistant will use.
Here's the minimal configuration required to run a CALM assistant:
recipe: default.v1
language: en
pipeline:
- name: CompactLLMCommandGenerator
policies:
- name: FlowPolicy
Below are the main parameters you can configure.
Recipe
- Rasa provides a default graph recipe:
default.v1
. For most projects, the default value is sufficient. - In case you're running ML experiments or ablation studies and want to add a custom graph recipe, this guide has you covered.
Language
- The
language
key sets the primary language your assistant supports. Use a two-letter ISO 639-1 code (e.g.,"en"
for English). - The
additional_languages
key lists codes of other languages your assistant supports.
👉 Learn more about language configuration
Pipeline
The pipeline
section lists the components that process the latest user message and produce commands for the conversation. The main component in your pipeline is the LLMCommandGenerator
.
pipeline:
- name: CompactLLMCommandGenerator
llm:
model_group: openai_llm
flow_retrieval:
embeddings:
model_group: openai_embeddings
user_input:
max_characters: 420
👉 See the full set of configurable parameters
Policies
The policies
key lists the dialogue policies your assistant will use to progress the conversation. For CALM, you need at least the FlowPolicy
. It doesn’t require any additional configuration parameters.
Assistant ID
The assistant_id
key defines the unique identifier of your assistant. This ID is included in every event’s metadata, alongside the model ID.
Use a distinct value to help differentiate between multiple deployed assistants.
assistant_id: my_assistant
If this required key is missing or still set to the default placeholder, a random assistant ID will be generated and added to your configuration each time you run rasa train
.
Endpoints
The endpoints.yml
file defines how your assistant connects to key services — like where to store conversations, execute custom actions, fetch trained models, or generate responses.
Below are the main parameters you can configure.
Tracker Store — Where conversations are stored
The tracker_store
determines where Rasa keeps track of conversations. This is where your assistant remembers past interactions and makes decisions based on conversation context. You can store trackers in a file, a database (like PostgreSQL or MongoDB), or other storage backends.
👉 How to configure tracker stores
Event Broker — Where conversation events are sent
Conversation history is comprised of events — every user message, action, or slot update is one. The event_broker
sends these to other systems (e.g. for monitoring, analytics, or syncing with a data warehouse). It’s especially useful in production setups.
👉 How to configure event brokers
Action Endpoint — Where custom code runs
When your assistant needs to do something dynamic — like fetching user data or making an API call — it uses custom actions. The action_endpoint
tells Rasa where your action server is running so it can call it when needed.
👉 How to configure action server
Models — Where trained models live
The models
section lets you configure remote model storage, such as a cloud bucket or server, where Rasa can automatically fetch the latest trained model at runtime. This is useful for CI/CD workflows where models are trained and uploaded externally.
👉 How to configure model storage
Model Groups — LLM and embedding models
The model_groups
section is used to define LLMs and embedding models used by features like retrieval, rephraser, and command generator. You specify provider, type, and settings for each group.
👉 How to configure model groups
Lock Stores — Prevent processing conflicts
The lock_store
manages conversation-level locks to ensure that only one message processor handles a message at a time. This prevents race conditions when multiple messages for the same user arrive close together — a common scenario in voice assistants or high-traffic setups.
Message processors are tied to Rasa processes, and deployment setup affects the lock store you should use:
-
Single Rasa process (typically for development): the in-memory lock store is sufficient.
-
Multiple Rasa processes in one pod (i.e. multiple Sanic workers): use the
RedisLockStore
orConcurrentRedisLockStore
. -
Multiple Rasa processes across multiple pods: we recommend using the
ConcurrentRedisLockStore
, as described here.
Vector Stores — Enterprise search and flow retrieval
If your assistant uses Enterprise Search Policy, the vector_store
allows you to define where the vector embeddings of the source documents are stored. It can also be used to connect to a search API that returns a set of relevant documents given a keyword or a search query.
👉 How to configure Enterprise Search (RAG)
👉 How to customize flow retrieval
NLG Server — External response generator
If you want the assistant’s responses to be generated dynamically by an external system (like an LLM-based server), you can configure an nlg
endpoint. This allows you to update responses without retraining your model.
To use this, the endpoint must point to an HTTP server with a /nlg
path. For example:
nlg:
url: http://localhost:5055/nlg
Contextual Response Rephraser — Rephrase responses with LLMs
Rasa’s built-in rephraser can automatically rewrite your templated responses using an LLM. It preserves intent and facts while making responses sound more natural or varied based on conversation context.
To enable it:
nlg:
type: rephrase
👉 Learn more about the rephraser
Silence Handling — Timeout before triggering fallback
The silence_timeout
setting controls how long the assistant waits for a response before assuming the user is silent. Silence timeouts help your assistant handle situations where the user doesn’t respond. For now, this setting only works with voice-stream channels, such as:
- Twilio Media Streams
- Browser Audio
- Genesys
- Jambonz Stream
- Audiocodes Stream
Default is 7 seconds, but you can override the value:
interaction_handling:
global_silence_timeout: 7
👉 Learn more about silence handling
Domain
The domain.yml
file defines the universe your assistant operates in — including its responses, memory (slots), and supported actions.
Example:
version: "3.1"
session_config:
session_expiration_time: 60 # value in minutes, 0 means no timeout
carry_over_slots_to_new_session: true
responses:
utter_greeting:
- text: "Hello! How can I help you today?"
slots:
user_name:
type: text
initial_value: null
actions:
- action_greet_user
What’s in the Domain
- Responses: Templated messages your assistant can send.
- Slots: Data your assistant stores about the user.
- Actions: Logic or service calls your assistant can perform.
- Session Configuration: Controls when conversations reset.