Skip to content

April 2nd, 2025

Handling FAQs with Rasa and Faiss: How to implement RAG

  • portrait of Marina Ashurkina

    Marina Ashurkina

Introduction

This hands-on tutorial will help you set up an AI assistant using Rasa to handle frequently asked questions from your knowledge base. To dive straight into the code, you can explore these repos: Rasa with Faiss and OpenAI and Rasa with Faiss and Gemini.

Rasa is flexible and integrates with any vector database, such as Faiss, Milvus, Qdrant, Pinecone, Chroma, Weaviate, or Atlas by MongoDB. In this tutorial, we will discuss how to handle frequently asked questions with Faiss (Facebook AI Similarity Search), a powerful library that enables fast and efficient similarity search.

Why is RAG so important for Conversational AI?

Retrieval-augmented generation (RAG) is often the lowest-hanging fruit and the first choice for building Conversational AI assistants. RAG exponentially speeds up development, covers a much broader range of user requests than traditional NLU methods, and helps decrease LLM’s hallucinations by grounding knowledge in enterprise data.

Many companies start building their AI Assistants using a RAG framework and then incrementally add business logic to handle transactional skills. This phased approach, from RAG to transactional flows, enables teams to quickly launch MVPs and then gradually scale functionality based on real user needs.

Rasa offers a simple way to start with RAG using local text files and Faiss running in-memory. For more advanced use cases, you might consider setting up a separate vector DB server and an ingestion pipeline optimized for your workflow.

Alternatively, if you have well-structured question-answer pairs and don’t want to use generative models, you can explore Extractive Search.

Let’s start building.

Setting up the Development Environment

We’ll be using Codespaces and initiate Rasa from a pre-defined tutorial template.

You can watch the video on how to set up Codespaces or follow the steps below.

First, you need to create a new codespace on main (make sure you are signed in to your GitHub account).

Once the codespace is up and running, open the .env file and add your Rasa license and OpenAI key.

RASA_PRO_LICENSE=YOUR_VALUE
OPENAI_API_KEY=YOUR_VALUE

How to get the keys:
Rasa Pro License: link
OpenAI API key: link

Go ahead and train your assistant. In the command line, type:

rasa train
rasa inspect

RAG with Rasa, Faiss and OpenAI

By default, the Enterprise Search Policy uses OpenAI’s embeddings and LLM model gpt-3.5-turbo. Enterprise Search Policy is a component responsible for handling knowledge-based questions. You can enable it in three easy steps.

Step 1

In config.yml, uncomment EnterpriseSearchPolicy

policies:
 - name: FlowPolicy
 - name: EnterpriseSearchPolicy

Step 2

In data/patterns.yml in flow pattern_search replace action utter_free_chitchat_response with action_trigger_search

 pattern_search:
   description: Flow for handling knowledge-based questions
   name: pattern search
   steps:
     - action: action_trigger_search

Step 3

Add the folder docs in the main directory, then add documents in .txt format. You can create as many subfolders as you want. Enterprise Search Policy will index them automatically. Retrain your model to incorporate changes.

rasa train
rasa inspect

You can test your assistant in the Rasa Inspector.

Now that you have your assistant up and running, let’s customize a few parameters. So far, we’ve been using the default values, which are:

  • location: docs,
  • LLM: openai/gpt-3.5-turbo,
  • embedding model: openai/text-embedding-ada-002,
  • and the default prompt.

Rasa is extremely flexible, and all these parameters are fully customizable. Let’s see how we can change them.

RAG with Rasa, Faiss, and Gemini 2.0 Flash

Set model to Gemini 2.0 Flash

In .env, set the API key for Gemini.

Get Gemini API key: link

RASA_PRO_LICENSE=YOUR_VALUE
GEMINI_API_KEY=YOUR_VALUE

In config.yml, add parameter llm to EnterpriseSearchPolicy

Optionally, you can change the path to your documents in the source parameter, but we’ll leave it as docs.

- name: EnterpriseSearchPolicy
  vector_store:
    type: faiss
    source: docs
  llm:
    model_group: gemini_flash_model

In endpoints.yml, add a model group with the id: gemini_flash_model to model_groups

- id: gemini_flash_model
  models:
    - model: gemini-2.0-flash-001
      provider: gemini

Set embeddings to Gemini/text-embedding-004

In config.yml, add parameter embedding to EnterpriseSearchPolicy

- name: EnterpriseSearchPolicy
  vector_store:
    type: faiss
    source: docs
  llm:
    model_group: gemini_flash_model
  embedding:
    model_group: gemini_embeddings

In endpoints.yml, add a model group with the id: gemini_embeddings to model_groups

- id: gemini_embeddings
  models:
    - model: text-embedding-004
      provider: gemini

Set path to custom prompt

In config.yml, set the parameter prompt for EnterpriseSearchPolicy

- name: EnterpriseSearchPolicy
  vector_store:
    type: faiss
    source: docs
  llm:
    model_group: gemini_flash_model
  embedding:
    model_group: gemini_embeddings
  prompt: prompts/enterprise-search-policy-template.jinja2

Configure custom prompt

Add a folder prompts in your main directory. Create a new file enterprise-search-policy-template.jinja2 and add your prompt.

DOCUMENT:
{{docs}}

QUESTION:
{{current_conversation}}

INSTRUCTIONS:
Answer to the user's QUESTION using the DOCUMENT text.
Keep your answer short and ground in the facts of the DOCUMENT.
If the DOCUMENT doesnt contain the facts to answer the QUESTION return "Sorry, I don't know the answer to this question."

In the prompt template, you can access the following variables:

{{current_conversation}} #previous conversation turns
{{docs}} #retrieved documents
{{slots}} #slots

Enable or Disable Citation

In config.yml, set parameter citation_enabled for EnterpriseSearchPolicy to true

- name: EnterpriseSearchPolicy
  vector_store:
    type: faiss
    source: docs
  llm:
    model_group: gemini_flash_model
  embedding:
    model_group: gemini_embeddings
  prompt: prompts/enterprise-search-policy-template.jinja2
  citation_enabled: true

Once you set citation to true, the metadata with a path to the relevant source will be included in the retrieved document.

Set the number of conversation turns to be included in the query

In config.yml, set parameter max_messages_in_query for EnterpriseSearchPolicy to 3. This parameter determines how many conversation turns will be included in the search query.

- name: EnterpriseSearchPolicy
  vector_store:
    type: faiss
    source: docs
  llm:
    model_group: gemini_flash_model
  embedding:
    model_group: gemini_embeddings
  prompt: prompts/enterprise-search-policy-template.jinja2
  citation_enabled: true
  max_messages_in_query: 3

Set the number of conversation turns to be included in the prompt

In config.yml, set parameter max_history for EnterpriseSearchPolicy to 3.
This parameter determines how many conversation turns will be passed to the prompt.

- name: EnterpriseSearchPolicy
  vector_store:
    type: faiss
    source: docs
  llm:
    model_group: gemini_flash_model
  embedding:
    model_group: gemini_embeddings
  prompt: prompts/enterprise-search-policy-template.jinja2
  citation_enabled: true
  max_messages_in_query: 3
  max_history: 3

You can train your assistant now and test it within the Rasa Inspector.

rasa train
rasa inspect

You can explore these codespaces to see the final code: Rasa with Faiss and OpenAI and Rasa with Faiss and Gemini.

Next Steps

  1. Try adding more documents to your database.
  2. Test out different embedding and LLMs providers.
  3. Tweak the contents of your prompt to see how the assistant’s response changes.
  4. Add more flows to your assistant to test how Rasa handles different scenarios.
  5. Watch our on-demand webinar RAG Alone is Not the Answer
  6. Stay tuned for a follow-up blog post on advanced RAG ingestion techniques.