Beyond Prompt Injection: AI’s Real Security Gaps
Audio-only version also available on your favorite podcast streaming service including Apple Podcasts, Spotify, and iHeart Podcasts.
Episode Summary:
In Part 1 of this two-part MLSecOps Podcast episode, Principal Security Consultant Gavin Klondike clarifies common misconceptions around prompt injections, details indirect markdown exfiltration attacks, and explains the vital role clearly defined trust boundaries play in securing modern AI applications.
Transcript:
[Intro]
Dan McInerney (00:08):
Welcome to the MLSecOps Podcast. I'm your host, Dan McInerney, along with co-host Marcello Salvati. We're the threat researchers here at Protect AI, and we have another threat researcher and pen tester and hacker and LLM contributor over here, Gavin Klondike. Gavin, would you like to give us a little background about how you got here, what you're doing right now, and just generally what your expertises are?
Gavin Klondike (00:29):
Yeah, absolutely. First off, thanks for having me. I'm a Principal Security Consultant. I help a lot of customers realize the business value of cybersecurity investments. So that's my day job. I specialize in offensive security, pen testing, red team, AI, cloud applications. Outside of my day job, I do a lot of independent research on AI and AI security, so security for AI and AI for security.
Originally I was trying to find if we could use machine learning to increase our abilities as cybersecurity professionals to kinda help bridge some of that cybersecurity skills gap. I've been working with the AI Village at DEF CON, so I was a core contributor. I was head of the workshops and demos there for about eight years, eight or nine years at this point. I've since kind of left that position, but I still do a lot of my own independent security research. I contributed two of the OWASP Top 10 for Large Language Model Applications and was named top presenter last year at RSA for a presentation, talking exactly about that.
Dan McInerney (01:37):
Awesome!
Gavin Klondike (01:38):
So, that's a little bit about my background.
Dan McInerney (01:41):
So there's two interesting topics there with the AI Village for eight or nine years, and the LLM top 10. Let's start with the LLM top 10. How'd you get involved in that project? And what were the two contributions that you had and like, why were those two picked? Like what was the expertise and the angle that you took on those?
Gavin Klondike (01:58):
Yeah. So ever since ChatGPT came out, it was kind of amazing how many people suddenly found their 10 years of AI experience almost overnight. And so when somebody initially proposed the OWASP Top 10 for Large Language Model Applications, it kind of grabbed a lot of headlines as far as, Hey, here's stuff that the news is talking about, right? Not data scientists, not security practitioners. Here's a lot of stuff that's in the headlines. Let's turn that into a top 10 list. And you know what, let's actually just turn that into a full blown project.
So I hopped in there, a lot of other people hopped in there, I think at a tight, there was about 300-360 people who were trying to contribute, but really the main contributors was probably only a couple dozen. So a lot of people really wanted the recognition and notoriety in there.
Gavin Klondike (02:44):
And so conversations were all over the place. That brings up the question, how do you find the top 10 most common vulnerabilities in a technology that's only been out for about six months? So I put together first of its kind threat model. It was a formal stride threat model on looking at a holistic application that just happened to have LLM components. So the three core areas are how are these application or how are the large language model components of the application built, right? So the supply chain of that. Once it's built, how do attackers get into the application? And then once they're in the application, what can they do? So those were the three key focus areas.
And so we just kind of coordinated and directed the entire OWASP top 10 around that idea. So the two that I contributed to were LLM01 at the time, which was prompt injection, and then LLM02, which was insecure output filtering. Those were the things that we were seeing actively being exploited in the wild. And just my research and working with bug bounty hunters, actually being able to kind of follow and communicate, Hey, this is what we're seeing. This is how people are getting in, and this is where the ML DevOps people are kind of messing up. So it's really important that they know what the security risks are with some of these applications.
Dan McInerney (04:00):
That makes sense. So for the LLM prompt injection, there's like, do you consider jailbreaks to be a prompt injection or do you consider that to be a separate category?
Gavin Klondike (04:13):
So this is where we get into like some weird semantics 'cause I don't even really think prompt injections are a vulnerability, so to speak. And the, I'll answer the question, but the reason why I don't think they're really a vulnerability is because if you have just an API, right? There's no security concern when it comes to prompt injection, right? There's no backend system that it can impact. There's no users that it can impact. It's only once we start adding layers of functionality on top of that is where we start running into issues. Right?
Dan McInerney (04:44):
Yeah because you're the attacker and the victim in that case.
Gavin Klondike (04:46):
Yeah. So the example that I always bring up is if you use an 8-year-old to guard a bank vault, right? Eight year olds are very easy to bribe with cookies. So is the 8-year-old vulnerable because they're doing exactly what an 8-year-old would do, you give 'em a cookie, they say, sure, I'll let you do whatever. I would like that cookie. Or is the problem that you're putting the eight year old in the bank vault in the first place? So that's like a trust boundary issue.
So as far as like jailbreaks and prompt injections, it really kind of comes down to some of that semantics of I'm getting the LLM to change the conversation context, which is exactly what LLMs do based on certain prompts. There's no, there's no real surprise there. Jailbreaking is a way of doing that.
So there's indirect and direct prompt injections, direct prompt injections are the user has direct access to the large language model, and you just get it to change the conversation context. Sometimes that's jailbreaking, sometimes that's just asking the large language model to do something different. So that's kinda where it gets a little fuzzy.
Dan McInerney (05:51):
Yeah. Because it does sometimes feel like the LLM top 10 list, a lot of those could be folded under prompt injection.
Marcello Salvati (06:00):
And a lot of them aren't really, like, so I mean, there's two issues. Like one, definitely a lot of 'em could be folded into prompt injection, like I think information disclosures on there, right? Yeah. Where it's like you just have to get the system prompt like confidential data on the system prompt. Like at the end of the day, you're gonna be doing that through prompt injection, I think most of the time.
I think the other ones are also kind of like, not really real world applicable unless you're like nation state attackers to an extent. Or at least like, they're just infeas... Like you don't really see them all that often right now. Like neural back doors and stuff like that. I forget if that's on it, but I think that's on there too. So I think there's, I mean, like, like what you were saying, Gavin, like I think this is, it's hard to come up with some, any kind of taxonomy on a technology that's like new at all, like that newish to an extent.
Dan McInerney (06:53):
It does feel like those two that you contributed are fundamental too. There is the prompt injection, which includes jailbreaks and even like training data disclosure 'cause you're gonna use a prompt to inject into the LLM to get it to disclose it's training data or something. It's like almost everything can pull these two. There's data security and prompt injection.
Gavin Klondike (07:13):
So, the one thing that I would push back on is that there are distinct components, and it's a little challenging because it feels like, at least on the data science side, we took the last 20 some odd years of cybersecurity lessons and just threw 'em out the window. So a lot of these things actually are distinct. So for example, the sensitive information disclosure. Sensitive information is not getting the system prompt. I know a lot of companies think, oh, our system prompt is our intellectual property. And so if anybody's able to get the system prompt, then that's exposure. It shouldn't be considered that. The system prompt should be considered public just based on where the trust boundary falls.
Gavin Klondike (07:49):
Where the sensitive information disclosure comes from is LLMs can't keep secrets. So a really common example, or at least one that I always bring up is if you have a healthcare bot that I don't know anything about hemoglobins, but I can ask this chat bot. It has access to all of my medical information and a backend database. I asked this chat bot, Hey what does all my medical information mean? Right? Do I have cancer? Do I not have cancer? What's going on here? But if I convince it that I'm you, then I get all of your medical information.
Now you can try and say that that's a prompt injection vulnerability, but I would actually argue that it's not just the prompt injection, it's the fact that you gave essentially me access to a confused deputy. There's no reason that LLM should have more access to information than I have. Because then I can just get it to do something on my behalf. So that's kind of where like we, we really start to separate some of this stuff out.
Marcello Salvati (08:42):
Yeah, and I think a lot of this also comes back down to just like RBAC off Z and off end, right? Because I think in the backend, a lot of the biggest issues that I see right now, at least when it comes to like RAG, like RAG applications, is how to implement that off end and off Z, RBAC, and all those rules.
Dan McInerney (09:03):
And to Gavin's point, there's like, not really, as of today, there's not really a good way of getting an LLM to have RBAC that says, this user is allowed to access this information in my training data. I think there's probably some ways, and correct me if I'm wrong here, that there are ways of, like, you can have multiple RAG databases, and depending on the authorization of the user, maybe the LLM has a way of separating out who can access which RAG database. But I'm actually not sure about that. Do you know anything about if there's like RBAC or authorization for individual rag databases that an LLM is attached to?
Gavin Klondike (09:34):
Yeah. So I look at it as like normal software, right? Ignore the LLM part, but look at it as normal software. If I'm a user, then I give some sort of authentication key, and that authentication key opens the databases that I have access to. So the LLM should still use that key in order to connect to that backend data. It would be like, if it connected to an API, for example now because LLMs are a little finicky, we don't wanna give that key to the LLM itself. We wanna do out of band access. So I log in, my key is somewhere in the application, the LLM is also in the application. My key unlocks an API endpoint, or it unlocks the part of the database that I have access to. And then the LLM accesses or calls that same endpoint.
So if we try to do everything through the LLM, we break the trust boundary. So we need to do all of that out of band and then use the LLM as just kind of a helper component. Again, don't give it more access than the users are supposed to have. Otherwise, you get a confused deputy.
Dan McInerney (10:33):
That makes sense. So we'll circle back to the LLM top 10. But I was actually curious about when you said you were working for the AI village for the last like eight or nine years, or, or however long it was. What was AI security like prior to ChatGPT? Because I feel like most of the industry, it just kind of appeared overnight as soon as 3.5 came out.
Gavin Klondike (10:53):
Yeah, exactly. I think there's some definitions that are kind of up in the air and a little finicky. So we have red teaming, but even in cybersecurity we have red versus blue. And so pen testing, vulnerability scanning, that's considered red teaming. But wait, no, red teaming are the people who are phishing companies and actually getting in, and it's an entire team of people trying to break into an environment. So now we have two definitions of red teaming. Oh, but wait, red teaming is a whole separate thing as well. So when we start adding in, like AI red teaming, we now add another definition where these companies that are doing AI red teams are less concerned about prompt injection, are more concerned about, is my model racist?
Marcello Salvati (11:33):
Yeah, yeah, yeah.
Gavin Klondike (11:35):
Am I breaking the law? Because if I am an employer and I look at somebody's resume and their name is like Jamal, right? Or, or something stereotypical based on race, class, ethnicity, protected classes, basically something stereotypical of that, and I decide not to hire them because of that. That's a federal crime, right? That goes against the Equal Opportunity Act and all this stuff. But if I get a computer to do it, that's still a crime. So we're not allowed to say, Hey, I'm not racist. The AI told me to do this. So a lot of these red teams are trying to mitigate and identify biases, potential pads.
If you're a doctor, for example, you have to use what's called interpretable AI. So that means that you should always be able to follow how the AI came to a conclusion when prescribing medical information or prescription pills. Because if the AI prescribes the wrong pill, people can die. Which is malpractice. So that's what a lot of the AI or the traditional machine learning red teamers were really looking for.
Now that AI machine learning, it's all kind of becoming the same thing. Now that it's getting integrated into more and more applications that are being pushed out, people are trying to still mix AI red teaming with traditional red teaming when there's, it's still not quite apples to apples comparison. So that's where it gets a little fuzzy.
Marcello Salvati (12:56):
Yeah, I agree. Like, I think the overuse of certain like terminology definitely confuses a lot of people. Yeah, red teaming. That's always bothered me from day one. The minute I saw that we were adopting that term for like, this kind of security, security or compliance testing, you know?
Dan McInerney (13:17):
Yeah, it means a completely different thing in the world of pen testing. Which is where all three of us came from or are still at, essentially.
Marcello Salvati (13:22):
It definitely does muddy the waters, and I would like to see another term coined for it, or maybe just like, you know, any, any sort of new terminology adopted for this. But I doubt that's gonna happen, considering now it's pretty much entrenched into the sector. But yeah, no, I agree. The overuse of traditional terms for this is definitely a problem, I think, in my experience.
Dan McInerney (13:46):
Speaking of red teaming and pen testing, do you have any cool examples? Anonymized, I'm sure of recent pen tests against LLM applications that like maybe they had some vulnerabilities that people haven't really thought of before?
Gavin Klondike (14:00):
Yeah. So I didn't say this on my intro, but I actually run a YouTube channel called Netec Explained, where I take complex network security topics and explain them in an easy to understand way. One of the more popular videos that I have on there is actually how to get into AI red teaming. And it talks about the last two years, lessons learned since ChatGPT came onto the scene about hacking LLM applications.
So I followed a lot of bug bounty hunters. I would point out Joseph Thacker, Johann, Kai Greshake. So these are people who were at the forefront of everything. So I have examples of Google Bard before it became Gemini. Prompt injection, indirect prompt injection exploits for that. Some of the OWASP top 10, right? Those examples came from bug bounty researchers and some of the stuff that they were actually able to disclose.
Gavin Klondike (14:52):
So one of the coolest examples I would say would be actually, I guess it would be two things. So the first is indirect prompt injection. So indirect prompt injection would be I add something into my website that a human can't see. But when the LLM is reading the website, it sees essentially a prompt injection payload. And so from human's perspective, I can ask, Hey, here's a website. What is it saying? Can you give me recommendations based on the information that's on here? And then that prompt injection clicks, and now I've taken over as an attacker, I've taken over the conversation context. And so I can ask the user, Hey, what's your phone number? What's your email? What's your credit card number? What's your social security number? Stuff like that. And so they think that they're having a conversation with the LLM, and they are, but I'm now being added into that conversation.
Gavin Klondike (15:41):
So how do we add somebody into that conversation? Well, this is why we need to zoom out and look at the applications that the LLMs are being put into. LLMs can sometimes run JavaScript if it's being interpreted and markdown. So markdown exfiltration is a really popular exfiltration vector. So all you have to do is make a query to an image, say attacker.com/markdownxfill? And then I can set whatever parameter as part of the conversation context.
So one of the examples I have in there was Google Bard. You could actually use a macro to save the entire conversation to a Google sheet that the attacker controls. The user doesn't see anything 'cause it's an invisible image. So I, as an attacker, get full access to your conversation. I can see everything that you're entering in, you have no clue that this is happening. And then if I do the prompt injection correctly, I can ask for sensitive information that you would then type into the...
Dan McInerney (16:41):
Yeah, it's like a key logger. Yeah. It's an automatic key.
Gavin Klondike (16:44):
Essentially. Key logger.
Dan McInerney (16:46):
Was that one that you found, or was that somebody else's?
Gavin Klondike (16:50):
No, that was, that was one that somebody else found. What I found haven't been as exciting. Unfortunately, I don't get to spend as much time as some of the bug bounty hunters do with these applications. And of course some of the ones that I do for my customers, I can't talk about. But this is a common theme. So if you're interested in finding bugs in large language model applications, look for exfiltration through markdown or JavaScript, and then look for opportunities for indirect prompt injection. Those two are...
Dan McInerney (17:17):
I'll call Johann Rehberger and again, I feel like I'm butchering his name all the time, but I remember he did one with a Slack exfiltration too. Where there was a Slack bot that had an LLM that had memory or something, and you could find secret slack channels or something. I honestly, I'm probably just losing myself here because I don't remember all the details, but it was able to exfiltrate Slack data because of an AI that would summarize your Slack chats. And that does feel like maybe the most impactful AI specific vulnerability where we're not talking about RBAC, we're not talking about traditional machine traditional web application security.
Indirect, prompt objection to me probably is, and I'd like your opinion on this too, if you were to just name like one of these that is the most impactful, that is the largest checkbox for a defender to think about, what would that be from the OWASP top 10 or any of list?
Gavin Klondike (18:07):
Oh, you know, again, I like to look at things a little bit more holistically, so it's hard for me to point to any one thing. I think the biggest challenge is the trust boundaries. Like I said, it is like, we took the last 20 years of cybersecurity knowledge and we just threw it out the window. You gotta remember, right?
So the people that are being hired to develop these systems are typically PhDs. PhDs who have a lot of experience in data science, and a lot of experience in research and in PhD land the goal is to get it working no matter what. So they have trouble thinking of themselves as software developers. So they don't follow typical software development practices. They have trouble thinking of themselves as security people, so they don't follow secure software practices. Again, this is not to throw shade at them, it's just the ecosystem that we find ourselves in.
So a lot of these people who are developing the systems don't develop them with modern technologies or modern concerns. In mine, I'll give you an example. So back in January, right? DeepSeek-R1 coming outta China made headlines. It only took a few hours before a cybersecurity research firm hacked the entire database because it had no authentication.
Dan McInerney (19:16):
Yeah.
Gavin Klondike (19:18):
I can't, I can't point to any one thing. It's the entire landscape. And as it is, right? When it comes to ML Ops, how these systems are being developed, there's a bit of a cat and mouse game where the vendors don't think of themselves as security people. So they'll roll entire applications and infrastructure that are being placed in Amazon, Salesforce, you name it. Anybody who's doing AI or machine learning development and they have no authentication by design. There's no authentication.
And, you know, some of the network operators will put it out on the internet. And so now we have a known critical vulnerability that is actively being exploited and was disclosed a year ago. The vendor throws their hands up and says, Hey, I'm not responsible for this. You shouldn't put it on the internet in the first place. And then the customers are saying, Hey, you need to at least allow authentication, like secure authentication into these systems.
So this is where we're seeing a lot of bugs and CVEs being released from ML Ops organizations and companies, because they're, they're just not thinking of themselves as security people. So that's where I see the problem.
Dan McInerney (20:28):
We've seen that a thousand times now.
Marcello Salvati (20:30):
Yeah like on all different kinds of projects. I think a lot of the eco, like a lot of the open source ecosystem suffers from the same issues as well, where it's like that there's a lot of like just ship it like now mentality. And there's no concern for hey, like secure defaults, secure baseline configurations, secure at least documentation on how to, like that let's put out documentation on how to properly set up a secure environment to run this project in, like, a lot of that is completely overlooked.
Dan McInerney (21:05):
And to your point, where we're not throwing shade on the open source developers because they're literally just putting their work up for free. You can't, you can't sit there and say, oh, you're a terrible developer. Take all this stuff down. What ends up happening is exactly to your point, is these PhDs end up doing something that makes their life easier and they share it with other people, and then it becomes suddenly really popular with 15,000 stars. And so it's like, it's not your fault, but it's something that organizations have to think about. And if organizations care, then they can contribute to these open source softwares and make them more secure. Which I think is something that's very much missing from right now...
Marcello Salvati (21:40):
Yeah, I mean...
Dan McInerney (21:41):
In the industry.
Marcello Salvati (21:41):
That's always been like a problem with open source as a whole, right? Where it's like organizations don't really like contribute.
Dan McInerney (21:48):
Yeah, they'll make millions of dollars from these ML Ops tools and get research pushed out, but then they don't even contribute to it.
Marcello Salvati (21:54):
I think a lot of this, like, you know, a lot of like history doesn't repeat itself, but it sort of does rhyme like, applies to a lot of the, a lot of the new tooling and a lot of the new things in the AI sector. Like what Gavin was saying, where it's like we threw out 20 years of security, like lessons learned, and we just ignored all that. I mean, you can argue even like Hugging Face, like Hugging Face is basically rediscovering all of the security problems that GitHub had to go through when it first came out. So like, it's a lot of the same stuff over and over again.
Dan McInerney (22:26):
So what about, back to like prompt injection and AI specific attacks outside of the ML Ops ecosystem, where do you see like the best, what are the best defenses against those AI specific attacks? Like prompt injection...
Gavin Klondike (22:43):
Yeah. So, prompt inject a, a lot of these can actually be mitigated by like good architecture review and threat modeling. I've since become a threat model advocate. Please do more threat modeling. Do more threat modeling. It's not as hard as you think. Please do more threat modeling. It will save you million. If I could save any company a million dollars a year on their cybersecurity budget, it would be through threat modeling.
Dan McInerney (23:05):
Yeah. Prioritize...
Gavin Klondike (23:07):
As far as prompt injection. Yeah, exactly. As far as prompt injection, that's a little tricky. So the reason why prompt injection is still an active area of research and it's non-trivial to solve is because what we're doing is we're sending control traffic and data traffic through the same channels. And so if you remember from like SQL injection when that was really big that was the problem, right? We can't separate control traffic and data traffic. I mean, heck, even old school buffer overflows it was the same problem.
So when it comes to trying to mitigate prompt injection, all we can really do is just make it a pain in the butt to actually try and prompt inject our model so that, you know, any attacker decides to pivot and move to somewhere else that's really the best that we can do.
Gavin Klondike (23:57):
So you get things like AI firewalls. A really good system prompt goes a long way. So, or a developer prompt, I believe Open AI is calling it now. But that really goes a long way in helping mitigate some of these prompt injections. And then of course, on top of it, you're always going to have some risk involved. So account for that.
One of the things that I usually talk about is if you're gonna have a backend API that the LLM component talks to make sure that the API itself is locked down and assume anything coming from the LLM is at the same risk class as anything coming from an untrusted user. And so as long as you put those checks in place, the impact of prompt injection is gonna be very small.
Marcello Salvati (24:41):
Yeah, I agree. I think a lot of this, like, again, with the whole rhyming history doesn't rhyme thing, but like, I think a lot of this, like it's all comes down to user input again, right? So it's like traditionally like user input is treated as, you know, unsafe. A lot of this, you know, basically like LLM firewalls or essentially WAFs, right? They're like basically web application firewalls with, you know, a little bit of, maybe a little bit more complexity sometimes. But a lot of this definitely does boil down to just keep treating user input as malicious.
Dan McInerney (25:11):
And LLM input or LLM output now as much...
Marcello Salvati (25:13):
Yeah, LLM output. Yeah. Correct. Yeah, can potentially result in, you know, injections or whatever classes.
Gavin Klondike (25:21):
And that was, that was number two on that top 10 list that I had put together. 'Cause I can, I don't have to do direct cross site scripting. I can do cross site scripting through an LLM, it reads out JavaScript and now I have cross site scripting or I have...
Dan McInerney (25:36):
A lot of those LMS that were just LM with a web interface that says, Hey, you know, we are your SQL database. It turns out if you just tell it, well write this JavaScript, their web interface that just generates.
[Closing]
Additional tools and resources to check out:
Protect AI Guardian: Zero Trust for ML Model
Recon: Automated Red Teaming for GenAI
Protect AI’s ML Security-Focused Open Source Tools
LLM Guard Open Source Security Toolkit for LLM Interactions
Huntr - The World's First AI/Machine Learning Bug Bounty Platform
Thanks for checking out the MLSecOps Podcast! Get involved with the MLSecOps Community and find more resources at https://community.mlsecops.com.