Conversational AI: From POC to Production in Financial Services

Posted May 19, 2026

Updated

Maria Ortiz
Maria Ortiz

Most financial services organizations have cleared the first conversational AI hurdle: the POC. The demo impressed the right people, and the business case for production won approval. 

But a conversational AI system that handles real customer interactions, at scale, and in a highly regulated environment is more of a challenge. The cost of getting it wrong (whether that means a compliance gap, a degraded customer experience, or a black-box AI system that can't be audited) is higher than in just about any other industry, thanks to the large fines and sanctions that regulators can levy against financial services companies.

Rasa’s 2026 State of Conversational AI report found that financial services organizations have the lowest average confidence of any industry surveyed (scoring just 4.12 out of 7) and that 38% cite compliance as their top challenge. These teams are struggling to scale with the control and auditability their environment demands.

The stage-by-stage framework below can help you navigate the journey of scaling a conversational AI solution, and it’s grounded in what we see from financial services teams who have made it to production successfully.

Stage 1: Define the scope before you commit to the architecture

The most common mistake at the POC stage is building for the best-case conversation. The reality is that customer interactions are messy. People interrupt, contradict themselves, ask about three different things at once, and sometimes say the wrong thing at the wrong moment. A POC that handles curated test cases well can still fail badly in production.

Before committing to a full build, financial services teams need clarity on three things:

  • What conversations will this agent actually handle? The answer should be more precise than "customer service questions." Define the use cases, the edge cases, and the cases the agent should hand off to a human. Escalation logic is a fundamental design decision in financial services.
  • Who owns the system? In many financial institutions, the POC is owned by one team, and the production system is owned by another. That handoff is inherently risky. Accountability for model behavior, data governance, and ongoing performance should be established before the first line of code is written.
  • What does success look like? Our research found that 27% of enterprise leaders say achieving performance metrics is the hardest phase of the AI journey. That suggests the teams most likely to struggle are the ones who didn't define their success criteria upfront. Response accuracy, compliance adherence, containment rate, and escalation frequency are all measurable outcomes that allow you to track progress.

Stage 2: Build governance within the system

In most industries, governance is added once a system is already up and running. In financial services, that sequence is better in reverse. AI-related issues contributed to over $4 billion in FINRA and SEC penalties against financial services firms in 2024, driven by biased outputs, misleading communications, and inadequate supervision of automated systems. Retroactive governance fixes are very expensive, but proactive governance can prevent them.

For conversational AI specifically, governance has two distinct layers:

  • Infrastructure governance: Where does the system run, who can access it, and how is data handled? Seventy-five percent of financial services respondents in our survey require on-premise or own-cloud deployment. Some institutions simply prefer the benefits of owned governance, while for others it’s also a condition of deployment that’s driven by data residency requirements, security policy, or regulatory mandates.
  • Behavioral governance: What can the AI actually do, and can you prove it? This is where many financial services implementations regularly underinvest. A system that runs in a private cloud but produces outputs that can't be explained or audited still carries regulatory risk. Every interaction should flow through defined logic that is inspectable, logged, and aligned with evolving compliance requirements.

N26, the Berlin-based digital bank, addressed this directly when building its Rasa-powered AI assistant. Rather than routing customer data through a third-party cloud, the N26 team deployed on their own secure environment with full data control and used that foundation to serve five languages across mobile and web, including complex conversations around lost or stolen cards. Getting from idea to production took four weeks, and the governance architecture made that speed possible.

Stage 3: Choose your deployment architecture deliberately

The architecture decision that matters most in a regulated environment is how much autonomy the AI model is given over consequential actions.

Our research found that 63% of enterprise leaders prefer hybrid architectures that combine LLM flexibility with deterministic logic. And, in financial services, 100% of our survey respondents rated AI transparency as "very important" or "critical."

LLMs are well-suited to understanding what a customer means, even when they phrase it ambiguously. They are less suited to deciding what happens next in compliance-sensitive situations. A customer asking about a disputed transaction should trigger a defined process (e.g., identity verification, data retrieval, specific response logic) and that sequence should not be generated on the fly.

Rasa’s architecture separates these two responsibilities. Language interpretation happens at one layer while execution logic occurs at another. The result is an agent that can handle open-ended, natural conversation while still producing outputs that are consistent, auditable, and aligned with institutional policies. For teams building in financial services, this distinction is worth getting right before you scale.

Stage 4: Integrate with production systems 

Integration is where timelines can slip and scopes can expand. In financial services, integrating conversational AI with core banking systems, CRMs, identity providers, and case management platforms has enormous governance implications. Every system connection is a potential data exposure point, and every handoff between the AI and a downstream system needs to be traceable.

The teams that navigate this well tend to share a few common practices:

  • They sequence their integrations by risk and value. High-value, lower-risk integrations (like connecting to a knowledge base for FAQ deflection) get built first. Higher-risk integrations that touch transaction systems or identity verification come later, once the team has validated the system's behavior under lower-stakes conditions.
  • They treat the handoff to a human as an integration. When an AI agent escalates to a human agent, that escalation should carry full conversation context. That context transfer is both a user experience requirement and an efficiency requirement, and it needs to be built, tested, and monitored like any other system integration.
  • They track everything. The best-performing teams track more than whether an AI agent resolved the conversation. They also track where it got stuck, which fallback patterns triggered most often, and where customers dropped off. That instrumentation is what makes the system improvable and auditable when it needs to be.

Stage 5: Measure before you scale

The single most consistent finding in our research is that performance measurement is the hardest phase of the AI journey. For financial services teams, that’s particularly true. With zero tolerance for accuracy failures and strict requirements for compliance adherence, scaling a system you can't reliably measure is a significant risk for financial institutions.

The teams that succeed treat measurement as an ongoing process, not a one-time task at launch. That means establishing clear performance benchmarks before the system goes live, tracking leading indicators (like fallback rate and task completion) rather than just lagging ones (like CSAT), and building the review cadence into the operating model from day one.

The Australian health and travel insurer nib Group offers a useful case study of what that discipline looks like in practice. Rather than treating their AI assistant as a one-time build, they continuously reviewed and redesigned the system against specific goals like improved member experience, increased efficiency, and higher productivity. As a result, the company could serve over 150,000 members, a 200% year-over-year increase. The platform delivered because the team invested in iteration beyond the initial deployment.

Financial services institutions have more constraints than most. But they also have clearer success criteria, more experienced governance functions, and stronger incentives to get AI right. The teams that move from POC to production successfully are the ones who built the measurement and governance infrastructure that ensures the system is working.

Build for production from day one

The POC is arguably the easiest part of the conversational AI journey in financial services. The real work is building something that holds up under compliance scrutiny, integrates with production infrastructure, and performs reliably at scale.

Rasa's platform is purpose-built for highly regulated environments. From flexible deployment options that support on-premise and private-cloud configurations to Rasa's architecture that separates language understanding from business logic and full conversation logging and observability tooling, Rasa gives financial services teams the control and confidence they need to move from proof of concept to production.

AI that adapts to your business, not the other way around

Build your next AI

agent with Rasa

Power every conversation with enterprise-grade tools that keep your teams in control.