Skip to main content

Simulation and Evaluation

New in 3.17

Rasa Pro 3.17 introduces a new tool for simulation and evaluation, enabling you to test your agent's behavior against high-level goals without scripting every turn.

Simulation and evaluation lets you test your agent's behavior before shipping, without manually replaying conversations. You write a scenario in YAML describing who the simulated user is, what they want to accomplish, and what success looks like. Rasa drives the full multi-turn conversation automatically using an LLM-simulated user, then evaluates whether the agent met the stated goals.

This is complementary to E2E testing. E2E tests require you to script every turn upfront, which works well for deterministic flows. Simulation and evaluation is better suited to agents with some degree of autonomy, where the conversation path is hard to fully pre-define.

Evaluations vs. E2E Tests

E2E TestsEvaluations
Best forDeterministic, scripted flows with a fixed expected pathAgents with autonomy where the conversation path is hard to fully pre-define
When to useCI pipelines, regression checks on controlled business logicBuild loop — testing whether your agent achieves its goals while iterating
Pass/fail signalBinary per stepLLM judge scores + deterministic assertions combined
Suitable for CI?YesNo

Evaluations are not yet suitable for blocking CI pipelines. Keep your E2E tests in CI and use evaluations in your local build loop.

Prerequisites

  • Rasa Pro installed with the rasa tools component enabled.

  • The rasa-simulating-conversations agent skill installed. Run rasa tools init in your project root to install it.

  • An IDE agent connected to the rasa tools run FastMCP server is required for the agentic workflow. We recommend Claude Code with Sonnet 4.6, as this is the primary configuration for which the skill is optimized. Cursor with Composer 2.5 is also well-supported. Any other MCP-compatible agent (GitHub Copilot, etc.) will work too. For setup details, see Rasa MCP Tools.

  • The rest and inspector channel configured in credentials.yml. The simulator posts messages to /webhooks/rest/webhook:

    credentials.yml
    rest:
    inspector:
  • To inspect simulation results in the Rasa Inspector, start your server with --inspect:

    rasa run --inspect

    --inspect also enables the REST API that the evaluate_agent tool uses to inject initial slots and fetch the tracker — you do not need to pass --enable-api separately.

Quick Start

With the skill installed and your server running, type a prompt into your IDE agent:

Generate and run a happy path scenario for the add_contact flow.

The agent will:

  1. Check whether eval/conftest.yml exists. If not, it creates a starter template and pauses for you to fill in your LLM provider details.
  2. Read your flow definition to understand available slots, actions, and branching logic.
  3. Write a scenario YAML to eval/scenarios/.
  4. Call validate_scenario to check the file for syntax and domain errors.
  5. Call evaluate_agent to run the simulation and evaluation and return a pass/fail summary with a link to the result file.