Building An AI Agent for Sourcing Motorcycles with RASA

Project links:

The rise of LLMs has brought new opportunities in IT. I was very excited to learn about RASA's conversational AI agent competition, which provided a great opportunity to learn new technology through a competition.

My goals were to create an enterprise grade solution for building LLM-based chatbots and hopefully win!

I noticed that the solution would be scored for: applying RASA’s CALM (Conversational AI with Language Models) framework, extensive usage of different features (RAG search, custom actions, etc..), tests for dialogue scenarios created and fine tuning of my own LLM.

A false start and quick pivot

My first approach was an AI agent based on my own experience trying to find the best opportunities for their children to study abroad. After a week or so, I realized that I underestimated one important problem: finding well structured information about schools was difficult due to the variety of data.

I realized in order for my agent to work, I needed a lot of structured data. So I turned to my passion for motorcycles, of which there is plenty of structured data online. I wanted to build an AI assistant that could be very helpful for various people seeking an introduction to the world of motorcycles.

Approximately one week later, I submitted the solution.

Development highlights

Below are several of the highlights from the AI Moto Expert Agent

Greeting users

Since the AI Moto Expert is an informational BOT, it must introduce itself. Here is my implementation:

Add welcome utterance, declare action_session_start

Code it:


class ActionSessionStart(AppCustomAction):
    __action_name__ = "action_session_start"

    async def run(self, dispatcher: CollectingDispatcher, trckr: Tracker, domain: Dict[Text, Any], ) -> List[Dict[Text, Any]]:
        self.dbg(trckr)
        if not trckr.slots.get("action_session_start_already_triggered", False):
            dispatcher.utter_message(response="utter_greet_user")

        return [
            core_events.SessionStarted().as_dict(),
            events.SlotSet('action_session_start_already_triggered', True),
        ]

The agent lets users run one request without supplying personal information, then requiring them to share his/her data for any subsequent request. Here is how I did it.

Add action that counts requests

class ActionRequestsIncrementor(AppCustomAction):
    __action_name__ = "action_requests_count_increment"
    async def run(self, dispatcher: CollectingDispatcher, trckr: Tracker, domain: Dict[Text, Any], ) -> List[
        Dict[Text, Any]]:
        self.dbg(trckr)
        req_count: int = int(trckr.slots['requests_count'])
        return [events.SlotSet("requests_count", req_count + 1)]

Add special flow that collects data

obtain_user_data:
    if: false
    name: "Obtain user's data"
    description: |
      Here you're pretending to be AI ADMIN, that overtakes control to direct user through this flow.
      You behave in respectful and strict manner, letting user understand importance in inevitability of given procedure.
      Your goal is to obtain user's personal data.
    #    persisted_slots: [ user_name, user_email, user_gender, user_age, user_confirmed_personal_data ]
    steps:
      - id: obtain_info
        collect: user_name
        utter: utter_clarify_reasons_for_obtaining_personal_data
        description: 'Name of the user. @RA@ Capitalize name slot if needed.'
        reset_after_flow_ends: false
        next: user_email

      - id: user_email
        collect: user_email
        utter: utter_obtain__user_email
        description: 'E-mail of the user. @RA@ e-mail MUST be PROVIDED in a form of `<alphanum|.>@<alphanum>.<alphanum>`.'
        reset_after_flow_ends: false
        next: reset_confirm

      - id: reset_confirm
        set_slots: [ user_confirmed_personal_data: false ]
        next: confirm

      - id: confirm
        collect: user_confirmed_personal_data
        utter: utter_confirm_entered_data
        ask_before_filling: true
        reset_after_flow_ends: false
        next:
          - if: slots.user_confirmed_personal_data
            then: thank_and_continue
          - else:
              - set_slots:
                  - user_name: null
                  - user_age: null
                  - user_gender: null
                  - user_email: null
                next: obtain_info

      - id: thank_and_continue
        action: utter_thank_for_entering_data

      - action: utter_moto_assistant_is_back

Call this flow when necessary

steps:
      - action: action_requests_count_increment
      - noop: true
        next:
          - if: slots.user_confirmed_personal_data or (slots.requests_count < 2)
            then: need_time
          - else: to_user_data

Sends recap to the user

After the agent obtains a user’s email, it can use it to communicate with the user. That allows AI Moto Expert to share the full conversation once the user finishes a dialogue.

action_session_end has been built to handle it:

from typing import Any, Text, Dict, List

import rasa.shared.core.events as core_events
from rasa_sdk import Action, Tracker
from rasa_sdk.executor import CollectingDispatcher

from actions import ACTIONS_STUBBED, AppCustomAction
from scripts import communicate_user

class ActionSessionStart(AppCustomAction):
    __action_name__ = "action_session_end"
    async def run(
            self, dispatcher: CollectingDispatcher, trckr: Tracker, domain: Dict[Text, Any], ) -> List[Dict[Text, Any]]:
        self.dbg(trckr)
        user_email = trckr.get_slot("user_email")
        user_name = trckr.get_slot("user_name")
        if user_email and not ACTIONS_STUBBED:
            messages = [
                f'''{user_name},''' if user_name else '',
                f'''Thank you for your conversation!''',
                f'''Please find it below, in case you'd like to revise any details''',
            ]
            for e in trckr.events:
                if e.get('event') == 'user':
                    messages.append(f'''\tYOU: {e['text']}''')
                elif e.get('event') in {'agent', 'bot'}:
                    messages.append(f'''\tMOTO EXPERT: {e['text']}''')
            messages += [
                \nJust reply to this message in case you'd like to leave a feedback.
            ]
            communicate_user.send_email(to_email=user_email, body='\n'.join(messages))
				
        print("SESSION ENDED")
        return [core_events.SessionEnded().as_dict()]

Here is the integration with email server:

Establish core flows

Flows usually define a goal-oriented, “happy” path for users. Each flow reflects a particular use case. Sometimes, developers have to implement tens or even hundreds of flows for one conversational agent. Here it stopped after three use cases.

FAQ

One of the easiest ways to utilize RAG technique is through a FAQ. For the AI Moto Expert, I ran an Internet search of the most frequent questions and answers about motorcycles.

After preparing QA pairs, I vectorized these questions and stored the vectors as a project file.

The retrieval part was a custom implementation with direct vector comparison.
Augmentation is at the end of action_trigger_search_faqhere.

For the generation part, I used GPT 4o.

The flow below is the FAQ based search:

faq:
    name: "FAQ section"
    description: |
      Call `faq` flow to respond all user's queries that look like a general question. Examples:
      - USER: How to choose a motorcycle for beginners?
      - USER: How to adjust ATV suspension for different terrain types?
      - USER: What problems occur when using an electric scooter in the rain?
      - USER: What navigation systems are suitable for ATVs?
      FAQ db contains questions on topics: comfort, components, customization, documentation, electric, maintenance, navigation, off_road, performance, safety, seasonal, storage, technical, troubleshooting, vehicle_selection
    steps:
      - action: action_requests_count_increment
      - action: action_trigger_search_faq_capture_request
      - noop: true
        next:
          - if: slots.user_confirmed_personal_data or (slots.requests_count < 2)
            then: need_time
          - else: to_user_data
      - id: to_user_data
        call: obtain_user_data

      - id: need_time
        action: utter_need_time
        next: run_faq

      - id: run_faq
        action: action_trigger_search_faq

Although CALM provides out of box integration with various vector DBs, I came up with the above implementation, because the Faiss DB I liked most did not work with the Python version I preferred.

DB Search

Local storage structured search is a core feature of the solution. The Excel-based DB is normalized and prepared for querying. I avoided using any kind of DBMS to boost development; hence, pandas library handles all queries. It is NOT a production solution, but addresses the challenge’s time-bounded conditions.

AI Moto Expert can handle any user request for details about a particular video.

The implemented Flow for this scenario::
vehicles_search_by_make_and_model:
  name: "Vehicle search by model"
  description: |
    Call 'vehicles_search_by_make_or_model' flow if:
    - User asks to search info about particular make and/or model. Examples:
      - USER: How many HP has Yamaha R1?
      - USER: What is the seat height for AJP SPR 240X Enduro?
      - USER: Help me with specific model
      - USER: <Manufacturer> <model name>
      - USER: <model name> <Manufacturer>
  steps:
    - action: action_requests_count_increment
    - noop: true
      next:
        - if: slots.user_confirmed_personal_data or (slots.requests_count < 2)
          then: get_make
        - else: to_user_data

    - id: to_user_data
      call: obtain_user_data

    - id: get_make
      description: |
        Get the name of the manufacturer.
        @RA@ Most common case is when user define <manufacturer> <model>. Try to infer model if possible. If not sure don't do it. Examples:
        - USER: Yamaha R1. user_search_specs_make=Yamaha, user_search_specs_model=R1
        - USER: Yamaha Einfield. Both are manufacturers - do not set any slots.
        - USER: Yamaha YFZ 450. user_search_specs_make=Yamaha, user_search_specs_model=YFZ 450
      collect: user_search_specs_make
      utter: utter_collect_user_search_specs_make
      next: get_model

    - id: get_model
      description: Get the name of the model from the list provided to user.
      collect: user_search_specs_model
      utter: utter_collect_user_search_specs_model
      next: run_search_db

    - id: run_search_db
      action: action_vehicles_search_by_make_and_model_basic
      next: get_specific_request

    - id: get_specific_request
      description: |
        @RA@ ANY input from user MUST be considered as a user's request for further processing except clear `no`.
        If user says no - ask if there is any additional assistance needed.
        @RA@ Before you switch to another flow, Try to understand user's request in the context of the current flow. If a user asks for a price or any other specific information in THIS particular step. go ahead, let him do it.
      collect: user_search_specific_request
      utter: utter_collect_user_user_search_specific_request
      ask_before_filling: true
      next: need_time

    - id: need_time
      action: utter_need_time
      next: run_search_internet

    - id: run_search_internet
      action: action_vehicles_search_by_make_and_model_internet

An important enhancement here is that The AI Moto Expert can fix a user’s misspelling by identifying the closest option, using Levenstain distance calculation. For example, a user’s “Yamha R1” input will likely be transformed to the correct answer: “Yamaha R1."
All CALM actions responsible for this use case are implemented here.

Here is where Levenstain’s distance calculation is applied.

Other Search (internet)

If a user’s request does not look like a model search or FAQ question, it will be handled by the “Other Search” Scenario, where The AI Moto Expert uses Perplexity integration to ensure the best possible answer.

other_search:
    name: "Other search"
    description: |
      Call 'other_search' flow if:
      - @RA@ User is asking to recommend particular brands or something else.
      - User's intent looks like a search for moto accessories/equipment/other. Examples:
        - I need top 3 navigation systems for ATVs
        - Where I can tune my bike?
        - What shocks are suitable for Aprilia Tuareg 660?
        - Recommend me 3 driving schools in Madrid.
        - What are the best motorcycle gloves for winter riding?
        - Where can I find a motorcycle-friendly road trip planner?
        - What are the top-rated motorcycle covers for winter?
        - What are the best motorcycle safety vests?
        - What are the best motorcycle mirrors for visibility?
        - Where can I find motorcycle-specific insurance?
        - What are the best motorcycle horn upgrades?
        - Where can I find a motorcycle repair manual?
        - What are the best motorcycle rain covers?
        - What are the best motorcycle ramp options?
        - Recommend some comfortable motorcycle seats.
        - What are the best motorcycle tail light upgrades?
        - Suggest some effective motorcycle anti-fog solutions.
    if: slots.user_confirmed_personal_data or slots.requests_count <= 3
    steps:
      - action: action_requests_count_increment
      - noop: true
        next:
          - if: slots.user_confirmed_personal_data or (slots.requests_count < 2)
            then: need_time
          - else: to_user_data

      - id: to_user_data
        call: obtain_user_data

      - id: need_time
        action: utter_need_time
        next: run_search_internet

      - id: run_search_internet
        action: action_trigger_search_other
Here is the implementation of action_trigger_search_other.

Intents

Problem

A common problem affecting all conversational systems is distinguishing between users’ intents that look similar, but in fact are not. In general, there are two possible solutions: a rules-based approach or AI-based approach. Rules-based approaches often affect perception of the whole conversation.

In traditional rules-based approaches, users may complain about a bot's inability to change discussion topics or extreme meticulousness. AI-based solutions are usually more flexible, but must be adapted to real world examples, regardless of which AI model is used for scenario selection.

Custom template and prompt enhancing

The CALM framework fixes that by combining AI-based language model flexibility with predefined logic, enabling fluent, high-trust conversations that reliably resolve user requests.

It lets users adapt a prompt template for Command Generator class. I’ve replaced this template with my version that, I hope, provides the most clear and precise instructions for LLM about current context and possible actions available. Here is the implementation.

Once I realized that descriptions of flows, variables and other entities are actually a part of a prompt for every request, I added extensive instructions for every flow or variable to define its meaning and proper usage for LLM. Testing showed that the model is quite good if not perfect.

E2E Tests and LLM finetuning

The mechanism provided by the RASA team enabled me to generate dozens of synthetic tests with LLM for every flow.

Another important feature of CALM is its ability to transform your tests to a dataset for LLM fine tuning. Well-finetuned LLM solutions make your technical scaling much more cost effective and speed up dialogue execution compared to the top rated “heavy” LLMs. Moreover it lets you keep sensitive data INSIDE your enterprise IT infrastructure, without sharing it to third-party LLM providers.

Files, starting from “e2e” are the code that generates tests automatically

Here is the dataset generated with RASA’s SDK.

Here is the model, prepared with this script using runpod’s cloud infrastructure.
I hope this brief summary will inspire you to start/revise your own conversational agents.

The CALM framework provided by RASA covers almost all aspects of the modern conversational AI building process.

I look forward to the next AI challenge from RASA.