notice

This is unreleased documentation for Rasa Documentation Main/Unreleased version.
For the latest released documentation, see the latest version (3.x).

Version: Main/Unreleased

Using LLMs for Intent Classification

Intent classification using Large Language Models (LLM) and a method called retrieval augmented generation (RAG).

Rasa Pro Only
Rasa Labs
Rasa Labs access - New in 3.7.0b1

Rasa Labs features are experimental. We introduce experimental features to co-create with our customers. To find out more about how to participate in our Labs program visit our Rasa Labs page.

We are continuously improving Rasa Labs features based on customer feedback. To benefit from the latest bug fixes and feature improvements, please install the latest pre-release using:

pip install 'rasa-plus>3.6' --pre --upgrade

Key Features

  1. Few shot learning: The intent classifier can be trained with only a few examples per intent. New intents can be bootstrapped and integrated even if there are only a handful of training examples available.
  2. Fast Training: The intent classifier is very quick to train.
  3. Multilingual: The intent classifier can be trained on multilingual data and can classify messages in many languages, though performance will vary across LLMs.

Overview

The LLM-based intent classifier is a new intent classifier that uses large language models (LLMs) to classify intents. The LLM-based intent classifier relies on a method called retrieval augmented generation (RAG), which combines the benefits of retrieval-based and generation-based approaches.

LLM Intent Classifier Overview

During trainin the classifier

  1. embeds all intent examples and
  2. stores their embeddings in a vector store.

During prediction the classifier

  1. embeds the current message and
  2. uses the embedding to find similar intent examples in the vector store.
  3. The retrieved examples are ranked based on similarity to the current message and
  4. the most similar ones are included in an LLM prompt. The prompt guides the LLM to predict the intent of the message.
  5. LLM predicts an intent label.
  6. The generated label is mapped to an intent of the domain. The LLM can also predict a label that is not part of the training data. In this case, the intent from the domain with the most similar embedding is predicted.

Using the LLM-based Intent Classifier in Your Bot

To use the LLM-based intent classifier in your bot, you need to add the LLMIntentClassifier to your NLU pipeline in the config.yml file.

config.yml
pipeline:
# - ...
- name: rasa_plus.ml.LLMIntentClassifier
# - ...

The LLM-based intent classifier requires access to an LLM model API. You can use any OpenAI model that supports the /completions endpoint. We are working on expanding the list of supported models and model providers.

Customizing

You can customize the LLM by modifying the following parameters in the config.yml file. All of the parameters are optional.

Fallback Intent

The fallback intent is used when the LLM predicts an intent that wasn't part of the training data. You can set the fallback intent by adding the following parameter to the config.yml file.

config.yml
pipeline:
# - ...
- name: rasa_plus.ml.LLMIntentClassifier
fallback_intent: "out_of_scope"
# - ...

Defaults to out_of_scope.

LLM / Embeddings

You can choose the OpenAI model that is used for the LLM by adding the llm.model_name parameter to the config.yml file.

config.yml
pipeline:
# - ...
- name: rasa_plus.ml.LLMIntentClassifier
llm:
model_name: "text-davinci-003"
# - ...

Defaults to text-davinci-003. The model name needs to be set to a generative model using the completions API of OpenAI.

If you want to use Azure OpenAI Service, you can configure the necessary parameters as described in the Azure OpenAI Service section.

Using Other LLMs / Embeddings

By default, OpenAI is used as the underlying LLM and embedding provider.

The used LLM provider and embeddings provider can be configured in the config.yml file to use another provider, e.g. cohere:

config.yml
pipeline:
# - ...
- name: rasa_plus.ml.LLMIntentClassifier
llm:
type: "cohere"
embeddings:
type: "cohere"
# - ...

For more information, see the LLM setup page on llms and embeddings

Temperature

The temperature parameter controls the randomness of the LLM predictions. You can set the temperature by adding the llm.temperature parameter to the config.yml file.

config.yml
pipeline:
# - ...
- name: rasa_plus.ml.LLMIntentClassifier
llm:
temperature: 0.7
# - ...

Defaults to 0.7. The temperature needs to be a float between 0 and 2. The higher the temperature, the more random the predictions will be. The lower the temperature, the more likely the LLM will predict the same intent for the same message.

Prompt

The prompt is the text that is used to guide the LLM to predict the intent of the message. You can customize the prompt by adding the following parameter to the config.yml file.

config.yml
pipeline:
# - ...
- name: rasa_plus.ml.LLMIntentClassifier
prompt: |
Label a users message from a
conversation with an intent. Reply ONLY with the name of the intent.
The intent should be one of the following:
{% for intent in intents %}- {{intent}}
{% endfor %}
{% for example in examples %}
Message: {{example['text']}}
Intent: {{example['intent']}}
{% endfor %}
Message: {{message}}
Intent:

The prompt is a Jinja2 template that can be used to customize the prompt. The following variables are available in the prompt:

  • examples: A list of the closest examples from the training data. Each example is a dictionary with the keys text and intent.
  • message: The message that needs to be classified.
  • intents: A list of all intents in the training data.

The default prompt template results in the following prompt:

Label a users message from a
conversation with an intent. Reply ONLY with
the name of the intent.
The intent should be one of the following:
- affirm
- greet
Message: Hello
Intent: greet
Message: Yes, I am
Intent: affirm
Message: hey there
Intent:

Number of Intent Examples

The number of examples that are used to guide the LLM to predict the intent of the message can be customized by adding the number_of_examples parameter to the config.yml file:

config.yml
pipeline:
# - ...
- name: rasa_plus.ml.LLMIntentClassifier
number_of_examples: 3
# - ...

Defaults to 10. The examples are selected based on their similarity to the current message. By default, the examples are included in the prompt like this:

Message: Hello
Intent: greet
Message: Yes, I am
Intent: affirm

Security Considerations

The intent classifier uses the OpenAI API to classify intents. This means that your users conversations are sent to OpenAI's servers for classification.

The response generated by OpenAI is not send back to the bot's user. However, the user can craft messages that will lead the classification to fail for their message.

The prompt used for classification won't be exposed to the user using prompt injection. This is because the generated response from the LLM is mapped to one of the existing intents, preventing any leakage of the prompt to the user.

More detailed information can be found in Rasa's webinar on LLM Security in the Enterprise.

Evaluating Performance

  1. Run an evaluation by splitting the NLU data into training and testing sets and comparing the performance of the current pipeline with the LLM-based pipeline.
  2. Run cross-validation on all of the data to get a more robust estimate of the performance of the LLM-based pipeline.
  3. Use the rasa test nlu command with multiple configurations (e.g., one with the current pipeline and one with the LLM-based pipeline) to compare their performance.
  4. Compare the latency of the LLM-based pipeline with that of the current pipeline to see if there are any significant differences in speed.