If you have been anywhere near the AI space in the last few weeks, you have seen it: Moltbot (previously Clawdbot) has taken the internet by storm. The GitHub repository sits at 102,000 stars and counting. YouTube reviews gush about seeing "the future" running on a Mac Mini. My DMs are flooded with people building "something similar."
The appeal is obvious. A local AI assistant that handles emails, calendars, file automation, messaging workflows, and even financial tasks, all without sending your data to external servers. For $5/month on a server or running 24/7 on your own hardware, it promises the kind of personal automation we have been dreaming about for years.
But here is what's not being discussed enough in all the excitement: the fundamental security architecture of pure LLM-driven agents makes them inherently vulnerable to manipulation. And no, sandboxing and containerization won't solve this problem.
The Lethal Trifecta: Why Pure LLM Agents Are Already Toast
Here is the uncomfortable truth that emerged from recent discussions among agent engineers: if our agent has access to sensitive data and can browse the web, we are already compromised, at least in principle.
The problem isn't whether someone can directly hack our machine. It is that they don't need to. They just need to get our agent to encounter content on the web that causes it to exfiltrate data or execute malicious commands on their behalf. This is the lethal combination:
- Your agent has access to data you care about
- Your agent browses the web or processes external content
That's it. With many local agents, just point 2 alone might be enough to compromise the machine they're running on, given the level of system access they typically have.
This isn't a theoretical concern. We've already seen proof-of-concept attacks in which a single email containing adversarial content triggered a data leak.
The attack surface here is semantic, not technical. You can't firewall your way out of prompt injection.
What the Data Actually Shows: Rasa vs. Prompt-Driven Agents
To understand the significance of this architectural difference, we partnered with Lakera to conduct a comprehensive security assessment comparing two agent approaches: a structured, flow-based agent built with Rasa and a conventional prompt-driven LLM agent. Both were mock implementations designed specifically to steel-man each approach. The Rasa agent included a response rephraser, which introduces some risk but can be disabled, while the prompt-driven agent used a tightly scoped system prompt.
The findings were stark:
The Prompt-Driven Agent exhibited high-severity vulnerabilities across every major risk category:
- Complete system prompt extraction and tool disclosure
- Generation of hate speech, profanity, and content safety violations
- Production of criminal and dangerous instructions (including drug synthesis and vehicle theft techniques)
- Full reproduction of copyrighted material (song lyrics, book excerpts)
- Exposure of sensitive customer financial data across accounts
- Denial-of-service conditions affecting all users
The Rasa Agent, by contrast, successfully resisted content-safety violations, disclosure attempts, and harmful-instruction requests. The only vulnerabilities identified originated from the optional rephraser component, and even these had limited impact due to the system's flow-based constraints and short conversation memory.
You can download the full report and watch our webinar discussion at info.rasa.com/rasa-agent-analysis-lakera.
Why Guardrails Aren't Enough
I've seen this pattern play out in countless discussions: someone points out the security risks of pure LLM agents, and the response is "just add better guardrails" or "use a more robust system prompt."
The uncomfortable reality is that prompt injections can always defeat guardrails. This is something inherent to how all LLMs work. We lie to ourselves, believing that guardrails make an LLM system safe. Yes, they're better than nothing, but they can always be tampered with.
The assessment data backs this up. Despite a tightly scoped system prompt with explicit instructions to withhold system information, the prompt-driven agent consistently fell victim to multi-turn drift attacks. Adversaries crafted inputs that gradually shifted the agent's context across multiple exchanges, from automotive queries to chemistry domains, from car-buying assistance to generating instructions for methamphetamine synthesis.
This isn't a failure of implementation. It's inherent to the technology. When our only protection is inference itself, when the same LLM that generates responses is also responsible for determining what's safe, we've created a single point of failure.
The Hybrid Architecture Answer
So what's the alternative? The answer lies in hybrid architectures that separate what LLMs do well from what they do poorly.
In a structured, flow-based approach like Rasa's, the LLM proposes simple next steps, such as which flow to start and which slot to fill. But it doesn't directly generate responses or make critical decisions. Instead:
- Business logic executes through structured actions, not free-form LLM generation
- Responses come from templates, with optional light rephrasing
- Predefined flows, slots, and policies determine what actually happens
- Tool access is gated in code, not by inference
- Destructive actions require explicit consent with dry runs the user can review
This design keeps the system within narrow operating boundaries. The LLM becomes a component in a larger system, not the system itself. Attack surface shrinks dramatically because fewer entry points exist for adversarial manipulation.
Yes, we trade some conversational flexibility. The assessment showed that prompt-driven agents handle nuanced requests better and feel more natural in open-ended conversations. But when we're talking about systems that have access to our financial data, our email, and our file system, do we really want maximum flexibility, or do we want predictable, controllable behavior?
The Corporate Shadow IT Problem
Here is where this becomes particularly concerning for organizations: Moltbot installs locally and functions as an endpoint, bypassing traditional SaaS posture or monitoring tools. It is the perfect shadow IT scenario.
I'm already seeing this play out. In my social circle alone, I've noticed events specifically targeting go-to-market profiles, founders' associates, and operations hackers, the exact people who will adopt these tools first, and who typically lack deep security knowledge. One event description I saw listed "high-level AI tools (Clay, Clawdbot, etc.)" as if they're equivalent in risk profile.
When a junior marketer or sales operator installs Moltbot in their company laptop to streamline their workflows, it runs quietly under their credentials, connecting to messaging platforms, file systems, and calendars. Security teams never see the traffic. IT has no visibility. The agent is driven by a generative AI that can be tricked into acting on behalf of attackers through social engineering or indirect prompt injection.
Meanwhile, we're also seeing hundreds of unsecured Clawdbot gateways appearing on Shodan, with users reporting thousands of attack attempts over a single weekend. The gateway accepts IP-based requests and runs as a daemon. The result is that most users won't realize what they've exposed.
This isn't hypothetical. It's happening right now.
What Organizations Should Do
The solution is not to forbid local AI agents. That ship has sailed. The need is real, the benefits are too large, and our teams will run them anyway, with or without permission.
Instead, it lies in our hands as technical leaders to:
1. Establish clear policies for local agent deployment. Define which data agents can access, which systems they can control, and which network access they're permitted. Make these policies enforceable in code, not just policy documents.
2. Create dedicated environments for agent operation. Don't let agents run on primary machines with full access to everything. Some organizations and individuals are already buying used Mac Minis specifically for contained agentic environments. Virtualization and sandboxes help with infrastructure attacks, but remember: they don't solve semantic attacks or prompt-induced exfiltration.
3. Implement hard capability separation. If an agent needs to access sensitive data, restrict their network access. If it needs to browse the web, restrict its access to sensitive systems and browsable websites. Design architectures that enforce bifurcation, don't rely on the LLM to make these security decisions.
4. Advocate for hybrid architectures. When building or selecting AI agents, look beyond pure LLM implementations. Seek out systems that use structured flows, template-based responses, and code-gated tool access. The conversational flexibility trade-off is worth it when we're dealing with production systems and sensitive data.
5. Educate non-technical users. The people most likely to adopt these tools first are the ones least equipped to understand the risks. Create clear guidance on what's safe, what's not, and what approvals are needed.
We know these risks exist. The average user does not. It's our responsibility to bridge that gap.
Looking Forward: What Moltbot Represents
Let's be clear about what Moltbot is and isn't. The project is barely three months old. In its current state, it's full of antipatterns, as most projects are at this stage. Over time, much of this will improve. With 300+ contributors and an active community, implementation will be hardened, rough edges smoothed.
But they won't be able to change the nature of LLMs or the security issues intrinsic to the technology. No amount of community effort will make pure LLM agents immune to prompt injection.
What Moltbot represents isn't a single product that will dominate forever. Remember GPTEngineer in 2023? It showed us what was possible; the actual library faded, but the concept evolved into products like Lovable and Claude Code. Moltbot is serving the same proof-of-concept role by demonstrating that local, always-on AI assistants are both feasible and desirable.
Even if interest in Moltbot as a specific project fades, it has already inspired dozens of teams to work on similar problems. The repository's 102,000 stars (compared to n8n's 172,000) show this isn't a niche concern. This is becoming mainstream.
The question isn't whether local AI agents will be adopted. The question is whether we'll build them with security architectures that can actually contain the risks.
Join the Conversation
These are not easy problems to solve, and the landscape is evolving rapidly. At Rasa, we've been thinking deeply about agent architectures, security, and production readiness, not just in theory, but through real implementations and assessments like our collaboration with Lakera.
We're building a community of agent engineers who are grappling with exactly these questions: How do we balance flexibility with safety? What architectural patterns work in production? How do we prepare organizations for the wave of AI agent adoption that's already here?
If you're working on these challenges, whether you're building agents, securing them, or trying to figure out how to deploy them responsibly in your organization, I'd encourage you to join us.
Visit info.rasa.com/community to join the Agent Engineering Community. And don't forget to check out the full Lakera security assessment at info.rasa.com/rasa-agent-analysis-lakera.
The Moltbot moment is here. How we respond will determine whether AI agents become reliable production tools or another security nightmare we look back on with regret.
Let's build this future together, thoughtfully, in the Agent Engineering Community.






