July 25th, 2024
Behind the Release Notes: Rasa Platform 2024 Summer Release
Lauren Goerz
The Rasa Summer Release 2024 introduces significant enhancements in dialogue management and efficiencies in working with language models, reducing cost and latency. Read on to learn more about new features in Rasa Studio (our no-code user interface), and Rasa Pro (our conversational AI framework).
New in Rasa Studio (1.3-1.5)
Building great conversational AI that addresses customer problems requires a balance of control (when necessary) and flexibility (when possible). The latest updates in Rasa Studio introduce advanced dialogue management tools for builders and conversational AI designers, providing the necessary control during critical interactions.
- Flow Guards: Rasa’s CALM (Conversational AI with Language Models) dialogue engine is powerful for navigating natural language conversations. It can leverage the contextual understanding of a language model to intuitively choose which flow best serves your customer at that moment. However, depending on the situation, you might want certain flows accessible by certain profiles or contextual scenarios. Flow Guards allow teams to assert what flows are accessible based on the situation or channel.
- Situational Orchestration: For example, the “transfer_money” flow can be protected to ensure only logged-in and verified bank customers can initiate it, allowing you more control over when and what flows are available to be initiated by CALM.
- Channel Orchestration: You can leverage this feature to orchestrate different modes and channels. For example, if you have some “voice only” or “text and app only” flows, you can use flow guards to allow only certain flows to be accessible depending on the channel.
See how easy it is to assert the entry criteria and ensure that a flow can only be activated when the specific criteria are met.
- Call Flow & Return Node: To increase modularity and reusability of user journeys, we have introduced the “Call Flow & Return” node. This allows you to jump to and return from different flows. By embedding smaller centralized user journeys (child flows) inside larger ones (parent flows), you can efficiently reuse common user journeys at any point and reduce the complexity of the flow structure. For example, if a user tries to transfer money to an unregistered recipient, the system automatically activates the “add transfer recipient” flow. This process collects the necessary details and returns the user to the transfer money flow, allowing them to complete the transfer. This is also useful if you have a centralized authentication flow or a centralized appointment booking flow.
Additional Improvements:
- Guided Building with Flow Validation: Fail fast and ensure success. As you build in Rasa Studio, the system will let you know if your implementation is usable. If you make a mistake, the system will automatically highlight it to ensure you know where the error occurred and how to fix it.
- Studio Upload: You can now perform an initial import of Rasa Pro to Rasa Studio with the command
$ rasa studio upload --calm
- Data Retention Policy: Automatic Conversation Deletion allows for the periodic removal of old conversations and their associated data from the system. This behavior is designed to help manage data retention and comply with data protection regulations.
For more updates, be sure to check the Rasa Studio change log, and reference the compatibility matrix to ensure your version of Rasa Studio is compatible with Rasa Pro.
New in Rasa Pro (3.9)
As the demand for conversational AI grows, managing latency and cost with language models at run-time becomes increasingly critical, especially when implementing voice assistants. Rasa Pro 3.9 introduces strategies aimed at significantly reducing these factors for assistants running on Rasa’s CALM dialogue engine.
- Reduce costs with our Multistep Command Generator: Benchmark tests show that using a GPT-3.5 model over GPT-4 can reduce the cost of running a CALM assistant by approximately 30x. To enable these smaller and cheaper LLMs to perform at a similar level as larger, more expensive models (GPT-4) we introduced a new component called the
MultiStepLLMCommandGenerator
. As CALM uses a language model to understand the user and to direct the dialogue, this new component breaks down these tasks into two smaller prompts to make the LLM’s job easier. At any given moment, CALM can invoke the LLM to eitherhandle_flows
orfill_slots
as you can see in the diagram below, depending on the user request. - Reduce latency and optimize throughput for voice interactions: Key runtime performance improvements optimize Rasa Pro for voice interactions. For example, for increased speed and efficiency we introduced a gRPC action server. This uses gRPC protocol to efficiently handle double the number of requests in the same amount of time.
- Increase Enterprise Search flexibility with custom information retrieval: Rasa's initial integration with vector stores, such as Qdrant and Milvus, laid the foundation for more advanced information retrieval capabilities. If you are using our enterprise search solution, Rasa now supports the use of Custom Information Retrievers with the
EnterpriseSearchPolicy
, meaning you now can use any vector database. However, that is not all. In the same spirit as our custom channel connectors, custom information retrievers provide a wide range of useful customization opportunities, including contextual slot-based retrieval and support for additional embedding models.
Additional Improvements
- Button Payload Format: Issue a
SetSlots
command to bypass an LLM call and reduce costs when filling a slot. - LLM or NLU-based Slot Filling: With CALM, you could leverage an LLM to contextually extract a slot. Now, you have the option to add NLU-based slot mappings so you can extract domain-specific slots and reduce latency at the point of slot filling.
For more information, check out the Rasa Pro change log. If you’re interested in learning how the Rasa platform can transform your customer interactions, reach out to connect with sales.