July 19th, 2023
LLM Security in the Enterprise
Lauren Goerz
Here at Rasa, our research teams have been looking at the promise of Large Language Models (LLMs) to make conversational interfaces more resilient and helpful than ever before for some years now. While we are hard at work on our vision to help our customers transform how people interact with organizations through technologies like large language models, we can’t ignore how important it is to enable our customers to leverage these models safely. Like any other type of technology, alongside all of the benefits of LLMs, conversational AI teams will need to manage the vulnerabilities of this technology to ensure success in production.
We got the chance to talk about the topic of LLM Security in the Enterprise in our recent webinar, where we had a great conversation with David Haber, CEO of Lakera, an AI Safety company, along with Rasa Co-Founder and CTO Dr. Alan Nichol and our Head of Infrastructure & Security, Jamie MacDonald. Here are three key takeaways that your team should think about as you consider using LLMs in production.
1. Prepare your defenses for prompt injection
Prompt injection sits in the number one position in the Open Web Application Security Project’s (OWASP) Top 10 for Language Models and is one of the key topics covered in this webinar. Here is how OWASP defines prompt injections:
"Prompt injections involve bypassing filters or manipulating the LLM using carefully crafted prompts that make the model ignore previous instructions or perform unintended actions."
Prompt injection in the context of conversational AI is a real problem for enterprise conversational AI teams for several reasons:
- Direct prompt injection attacks allow end-users to treat your AI Assistant as a hand puppet: CTO & Co-Founder Dr. Alan Nichol shares his experience performing a prompt injection attack on an insurance company that went live with an AI Assistant that did not implement these guard rails.
- Indirect prompt injection allows for third-party prompt injection: An adversarial prompt injection can be strategically hidden in a document, webpage, or API that an LLM consumes to steer the LLMs response in the wrong direction. In the context of a more connected AI Assistant, it could also direct the LLM to initiate an unauthorized action.
How to defend against prompt injection
We asked Jamie MacDonald, Head of Security & Infrastructure here at Rasa, to share some tactics enterprises can start with to defend against prompt injection, here is Jamie’s list:
- Simple deny-listing: Look for specific strings or sensitive data in input or output, and reject inputs or responses based on this. We can use these to prevent specific known-bad input from making it to the LLM or preventing known secrets from being returned to the user. Given how flexible LLMs are and how many different combinations of characters you could use to achieve your desired outcome, this is a very basic solution that can be easy to work around.
- Use a support model: Use another model to evaluate whether or not the input or response is acceptable. For example, Lakera details how in Level 6 of the Gandalf Experiment they send the input and response to another LLM and ask it to determine if the password is being requested or disclosed, rejecting the request if so. Lakera defines these as Input Guards and Output Guards, reviewing the prompt or response at different stages in the pipeline.
- Pro-active monitoring: Logging and monitoring of the LLM outputs or activity to detect anomalies that could be indicators of novel prompt injection techniques.
- Combinations of the above techniques for either input, output, or both.
2. Data governance, hallucination, and pending regulation are additional AI security topics to consider
-
Data governance: There are different risks depending on how you consume these different LLM services (E.g. on-premise vs. cloud). As many contextual AI Assistants require the collection and use of PII (Personal Identifiable Information) to perform actions on the user's behalf, it will be essential to prevent or carefully control how user data is transferred to third-party systems.
-
Hallucination: LLMs don’t know what they don’t know. The result of this fact is what many people in the industry have been calling a hallucination, but the important reality is that answers from an LLM are not grounded in a sense of factuality. This means LLMs can be powerful information sources, but they are usually not the best choice of technology to make critical judgment calls. For this reason, we anticipate that most enterprises running mission-critical AI Assistants at scale will still require both LLM-powered dialogue management as well as the ability to build in predictable logic for the moments when a specific answer has to be given.
- Regulation: In our webinar, we explored the panel's first impressions of the proposed EU AI Act introduced this year, which we expect to have far-reaching implications if we are to look at the parallel of GDPR. It is similar to GDPR in that we should take it seriously and prepare for it to be enforced, but it is different because the technology is still developing, and there are still some open research problems on how to implement some of the requirements outlined in the EU AI Act. This is why we believe you can expect this act to continue to develop and change over time.
3. No need to reinvent the wheel for LLMs
While large language models themselves are not new, we know the idea of actually using them in a production environment is still new for many brands.
While the maintenance and management of this technology will be new terrain for many teams, we believe that most security best practices still prove valuable in this new environment.
Practical tips for safer LLM deployments
David shared his top 4 practical AI Security tips that would lead to a safer LLM deployment:
- Don't connect your LLM to any form of PII: If you do, make sure to configure access with the principle of least privilege.
- Put a human in the loop: This is an easy way to mitigate LLM risks, especially as you are in the early phases of rolling out your application.
- Input and output sanitization: Put software protection in place to mitigate risks from prompt injection, data leakage, harmful content, etc.
- Regression testing: Establish a prompt baseline to test your models before deployment or during updates.
Conclusion:
Now that you have some things to start thinking about as you prepare your team to evaluate the potential large language models in conversational interfaces, we also want to share more about the approach we are taking so you know how we are working to become the safe pair of hands for your team to leverage large language models with the necessary guardrails:
- Retain modularity: Rasa is inherently very modular, so you can swap out and change and configure the components that you use as you choose. We will continue to offer this flexibility in future versions of our products. This is essential for a customized approach that can flexibly leverage LLMs where they add value and solve a problem.
- Adjust your risk profile: Talking with our customers, we know that the level of comfort with LLMs is currently a broad spectrum. Each brand will have a different risk-reward analysis and different security requirements they need to comply with. In our solutions, we are working to allow our users will be able to pick and choose where and how LLMs are used, and give them a dial to allow them to limit risk and retain more control when they want to.
- Architectural flexibility: We know our customers have different requirements for how they would like to consume LLM services. For example, we know some brands would like to avoid calling OpenAI’s APIs and would rather run this in a private network, so we are investigating ways to make this possible. This way, we will be able to support our customers regardless of their specific security posture.
We would like to extend a big thank you to David for joining us and sharing his insights on AI Safety and learnings from the Lakera Gandalf Experiment. If you want to learn more, you can rewatch the webinar recording, and check out other related resources mentioned in this blog below.