Introduction
This hands-on tutorial will help you set up an AI assistant using Rasa to handle frequently asked questions from your knowledge base. To dive straight into the code, you can explore these repos: Rasa with Faiss and OpenAI and Rasa with Faiss and Gemini.
Rasa is flexible and integrates with any vector database, such as Faiss, Milvus, Qdrant, Pinecone, Chroma, Weaviate, or Atlas by MongoDB. In this tutorial, we will discuss how to handle frequently asked questions with Faiss (Facebook AI Similarity Search), a powerful library that enables fast and efficient similarity search.
Why is RAG so important for Conversational AI?
Retrieval-augmented generation (RAG) is often the lowest-hanging fruit and the first choice for building Conversational AI assistants. RAG exponentially speeds up development, covers a much broader range of user requests than traditional NLU methods, and helps decrease LLM’s hallucinations by grounding knowledge in enterprise data.
Many companies start building their AI Assistants using a RAG framework and then incrementally add business logic to handle transactional skills. This phased approach, from RAG to transactional flows, enables teams to quickly launch MVPs and then gradually scale functionality based on real user needs.
Rasa offers a simple way to start with RAG using local text files and Faiss running in-memory. For more advanced use cases, you might consider setting up a separate vector DB server and an ingestion pipeline optimized for your workflow.
Alternatively, if you have well-structured question-answer pairs and don’t want to use generative models, you can explore Extractive Search.
Let’s start building.
Setting up the Development Environment
We’ll be using Codespaces and initiate Rasa from a pre-defined tutorial template.
You can watch the video on how to set up Codespaces or follow the steps below.
First, you need to create a new codespace on main (make sure you are signed in to your GitHub account).
Once the codespace is up and running, open the .env
file and add your Rasa license and OpenAI key.
RASA_PRO_LICENSE=YOUR_VALUE
OPENAI_API_KEY=YOUR_VALUE
How to get the keys:
Rasa Pro License: link
OpenAI API key: link
Go ahead and train your assistant. In the command line, type:
rasa train
rasa inspect
RAG with Rasa, Faiss and OpenAI
By default, the Enterprise Search Policy uses OpenAI’s embeddings and LLM model gpt-3.5-turbo
. Enterprise Search Policy is a component responsible for handling knowledge-based questions. You can enable it in three easy steps.
Step 1
In config.yml
, uncomment EnterpriseSearchPolicy
policies:
- name: FlowPolicy
- name: EnterpriseSearchPolicy
Step 2
In data/patterns.yml
in flow pattern_search
replace action utter_free_chitchat_response
with action_trigger_search
pattern_search:
description: Flow for handling knowledge-based questions
name: pattern search
steps:
- action: action_trigger_search
Step 3
Add the folder docs
in the main directory, then add documents in .txt
format. You can create as many subfolders as you want. Enterprise Search Policy will index them automatically. Retrain your model to incorporate changes.
rasa train
rasa inspect
You can test your assistant in the Rasa Inspector.
Now that you have your assistant up and running, let’s customize a few parameters. So far, we’ve been using the default values, which are:
- location:
docs
, - LLM:
openai/gpt-3.5-turbo
, - embedding model:
openai/text-embedding-ada-002
, - and the default prompt.
Rasa is extremely flexible, and all these parameters are fully customizable. Let’s see how we can change them.
RAG with Rasa, Faiss, and Gemini 2.0 Flash
Set model to Gemini 2.0 Flash
In .env
, set the API key for Gemini.
Get Gemini API key: link
RASA_PRO_LICENSE=YOUR_VALUE
GEMINI_API_KEY=YOUR_VALUE
In config.yml
, add parameter llm to EnterpriseSearchPolicy
Optionally, you can change the path to your documents in the source parameter, but we’ll leave it as docs
.
- name: EnterpriseSearchPolicy
vector_store:
type: faiss
source: docs
llm:
model_group: gemini_flash_model
In endpoints.yml
, add a model group with the id: gemini_flash_model
to model_groups
- id: gemini_flash_model
models:
- model: gemini-2.0-flash-001
provider: gemini
Set embeddings to Gemini/text-embedding-004
In config.yml
, add parameter embedding
to EnterpriseSearchPolicy
- name: EnterpriseSearchPolicy
vector_store:
type: faiss
source: docs
llm:
model_group: gemini_flash_model
embedding:
model_group: gemini_embeddings
In endpoints.yml
, add a model group with the id: gemini_embeddings
to model_groups
- id: gemini_embeddings
models:
- model: text-embedding-004
provider: gemini
Set path to custom prompt
In config.yml
, set the parameter prompt for EnterpriseSearchPolicy
- name: EnterpriseSearchPolicy
vector_store:
type: faiss
source: docs
llm:
model_group: gemini_flash_model
embedding:
model_group: gemini_embeddings
prompt: prompts/enterprise-search-policy-template.jinja2
Configure custom prompt
Add a folder prompts
in your main directory. Create a new file enterprise-search-policy-template.jinja2
and add your prompt.
DOCUMENT:
{{docs}}
QUESTION:
{{current_conversation}}
INSTRUCTIONS:
Answer to the user's QUESTION using the DOCUMENT text.
Keep your answer short and ground in the facts of the DOCUMENT.
If the DOCUMENT doesnt contain the facts to answer the QUESTION return "Sorry, I don't know the answer to this question."
In the prompt template, you can access the following variables:
{{current_conversation}} #previous conversation turns
{{docs}} #retrieved documents
{{slots}} #slots
Enable or Disable Citation
In config.yml
, set parameter citation_enabled
for EnterpriseSearchPolicy
to true
- name: EnterpriseSearchPolicy
vector_store:
type: faiss
source: docs
llm:
model_group: gemini_flash_model
embedding:
model_group: gemini_embeddings
prompt: prompts/enterprise-search-policy-template.jinja2
citation_enabled: true
Once you set citation
to true
, the metadata with a path to the relevant source will be included in the retrieved document.
Set the number of conversation turns to be included in the query
In config.yml
, set parameter max_messages_in_query
for EnterpriseSearchPolicy
to 3. This parameter determines how many conversation turns will be included in the search query.
- name: EnterpriseSearchPolicy
vector_store:
type: faiss
source: docs
llm:
model_group: gemini_flash_model
embedding:
model_group: gemini_embeddings
prompt: prompts/enterprise-search-policy-template.jinja2
citation_enabled: true
max_messages_in_query: 3
Set the number of conversation turns to be included in the prompt
In config.yml
, set parameter max_history
for EnterpriseSearchPolicy
to 3.
This parameter determines how many conversation turns will be passed to the prompt.
- name: EnterpriseSearchPolicy
vector_store:
type: faiss
source: docs
llm:
model_group: gemini_flash_model
embedding:
model_group: gemini_embeddings
prompt: prompts/enterprise-search-policy-template.jinja2
citation_enabled: true
max_messages_in_query: 3
max_history: 3
You can train your assistant now and test it within the Rasa Inspector.
rasa train
rasa inspect
You can explore these codespaces to see the final code: Rasa with Faiss and OpenAI and Rasa with Faiss and Gemini.
Next Steps
- Try adding more documents to your database.
- Test out different embedding and LLMs providers.
- Tweak the contents of your prompt to see how the assistant’s response changes.
- Add more flows to your assistant to test how Rasa handles different scenarios.
- Watch our on-demand webinar RAG Alone is Not the Answer
- Stay tuned for a follow-up blog post on advanced RAG ingestion techniques.