Simulation and Evaluation
Rasa Pro 3.17 introduces a new tool for simulation and evaluation, enabling you to test your agent's behavior against high-level goals without scripting every turn.
Simulation and evaluation lets you test your agent's behavior before shipping, without manually replaying conversations. You write a scenario in YAML describing who the simulated user is, what they want to accomplish, and what success looks like. Rasa drives the full multi-turn conversation automatically using an LLM-simulated user, then evaluates whether the agent met the stated goals.
This is complementary to E2E testing. E2E tests require you to script every turn upfront, which works well for deterministic flows. Simulation and evaluation is better suited to agents with some degree of autonomy, where the conversation path is hard to fully pre-define.
Evaluations vs. E2E Tests
| E2E Tests | Evaluations | |
|---|---|---|
| Best for | Deterministic, scripted flows with a fixed expected path | Agents with autonomy where the conversation path is hard to fully pre-define |
| When to use | CI pipelines, regression checks on controlled business logic | Build loop — testing whether your agent achieves its goals while iterating |
| Pass/fail signal | Binary per step | LLM judge scores + deterministic assertions combined |
| Suitable for CI? | Yes | No |
Evaluations are not yet suitable for blocking CI pipelines. Keep your E2E tests in CI and use evaluations in your local build loop.
Prerequisites
-
Rasa Pro installed with the
rasa toolscomponent enabled. -
The
rasa-simulating-conversationsagent skill installed. Runrasa tools initin your project root to install it. -
An IDE agent connected to the
rasa tools runFastMCP server is required for the agentic workflow. We recommend Claude Code with Sonnet 4.6, as this is the primary configuration for which the skill is optimized. Cursor with Composer 2.5 is also well-supported. Any other MCP-compatible agent (GitHub Copilot, etc.) will work too. For setup details, see Rasa MCP Tools. -
The
restandinspectorchannel configured incredentials.yml. The simulator posts messages to/webhooks/rest/webhook:credentials.ymlrest:
inspector: -
To inspect simulation results in the Rasa Inspector, start your server with
--inspect:rasa run --inspect--inspectalso enables the REST API that theevaluate_agenttool uses to inject initial slots and fetch the tracker — you do not need to pass--enable-apiseparately.
Quick Start
With the skill installed and your server running, type a prompt into your IDE agent:
Generate and run a happy path scenario for the add_contact flow.
The agent will:
- Check whether
eval/conftest.ymlexists. If not, it creates a starter template and pauses for you to fill in your LLM provider details. - Read your flow definition to understand available slots, actions, and branching logic.
- Write a scenario YAML to
eval/scenarios/. - Call
validate_scenarioto check the file for syntax and domain errors. - Call
evaluate_agentto run the simulation and evaluation and return a pass/fail summary with a link to the result file.