Simulation and Evaluation

New in 3.17

Rasa Pro 3.17 introduces a new tool for simulation and evaluation, enabling you to test your agent's behavior against high-level goals without scripting every turn.

Simulation and evaluation lets you test your agent's behavior before shipping, without manually replaying conversations. You write a scenario in YAML describing who the simulated user is, what they want to accomplish, and what success looks like. Rasa drives the full multi-turn conversation automatically using an LLM-simulated user, then evaluates whether the agent met the stated goals.

This is complementary to E2E testing. E2E tests require you to script every turn upfront, which works well for deterministic flows. Simulation and evaluation is better suited to agents with some degree of autonomy, where the conversation path is hard to fully pre-define.

Evaluations vs. E2E Tests

	E2E Tests	Evaluations
Best for	Deterministic, scripted flows with a fixed expected path	Agents with autonomy where the conversation path is hard to fully pre-define
When to use	CI pipelines, regression checks on controlled business logic	Build loop — testing whether your agent achieves its goals while iterating
Pass/fail signal	Binary per step	LLM judge scores + deterministic assertions combined
Suitable for CI?	Yes	No

Evaluations are not yet suitable for blocking CI pipelines. Keep your E2E tests in CI and use evaluations in your local build loop.

Prerequisites

Rasa Pro installed with the rasa tools component enabled.
The rasa-simulating-conversations agent skill installed. Run rasa tools init in your project root to install it.
An IDE agent connected to the rasa tools run FastMCP server is required for the agentic workflow. We recommend Claude Code with Sonnet 4.6, as this is the primary configuration for which the skill is optimized. Cursor with Composer 2.5 is also well-supported. Any other MCP-compatible agent (GitHub Copilot, etc.) will work too. For setup details, see Rasa MCP Tools.
The rest and inspector channel configured in credentials.yml. The simulator posts messages to /webhooks/rest/webhook:
credentials.yml
```
rest:
inspector:
```
To inspect simulation results in the Rasa Inspector, start your server with --inspect:
```
rasa run --inspect
```
--inspect also enables the REST API that the evaluate_agent tool uses to inject initial slots and fetch the tracker — you do not need to pass --enable-api separately.

Quick Start

With the skill installed and your server running, type a prompt into your IDE agent:

Generate and run a happy path scenario for the add_contact flow.

The agent will:

Check whether eval/conftest.yml exists. If not, it creates a starter template and pauses for you to fill in your LLM provider details.
Read your flow definition to understand available slots, actions, and branching logic.
Write a scenario YAML to eval/scenarios/.
Call validate_scenario to check the file for syntax and domain errors.
Call evaluate_agent to run the simulation and evaluation and return a pass/fail summary with a link to the result file.

Evaluations vs. E2E Tests​

Prerequisites​

Quick Start​

Evaluations vs. E2E Tests

Prerequisites

Quick Start