Breaking and Securing Real-World LLM Apps

Written by Guest | Jul 16, 2025 8:54:50 PM

Audio-only version also available on your favorite podcast streaming service including Apple Podcasts, Spotify, and iHeart Podcasts.

Episode Summary:

Prompt injection isn’t solved and the attack surface is only expanding.

In this episode, OWASP contributors and seasoned AppSec professionals Rico Komenda and Javan Rasokat join Charlie McCarthy to share practical insights from their research and talk at OWASP Global AppSec EU. They dive into how attackers are chaining prompt injection with privilege escalation and tool abuse, why input filtering alone won’t cut it, and how system-level behaviors must be tested at runtime. From shifting security left to building real AI firewalls, this conversation is a must-listen for anyone securing LLM-integrated applications.

Transcript:

[Intro]

Charlie McCarthy (00:08):

All right. Welcome back everyone to today's episode of the MLSecOps Podcast. My name is Charlie McCarthy. I'm one of your MLSecOps Community leaders, and today I am delighted to be diving into some of the concepts from a really cool talk that was given recently at OWASP Global AppSec EU called "Builders and Breakers, A Collaborative Look at Securing LLM Integrated Apps." Today we have the distinct honor of being joined by the two individuals who presented that presentation. Javan Rasokat and Rico Komenda. Gentlemen, welcome to the show.

Rico Komenda (00:44):

Thanks for having us.

Charlie McCarthy (00:45):

Yeah, absolutely. Thank you for being here. Before we dive into the meat and bones of the episode, why don't we just start with a quick round of introductions. Maybe Rico, we can start with you, talk about, you know, your current role and what brought you to the AI security space, and then Javan will pop over to you as well.

Rico Komenda (01:05):

Yeah, gladly. So, as earlier I said, my name is Rico Komenda. Right now I'm working at adesso SE which is a full service IT provider in Germany. And mainly my focus is application cloud and AI security. In the last two and three years, I derived into AI security, more or less from application security and then I got to the MLSecOps Podcast, so I'm a long time hero of this podcast. I always hear it when I go to work or something like that. So it's cool to maybe hear my own voice soon on the podcast. Yeah.

Charlie McCarthy (01:46):

Excellent.

Rico Komenda (01:46):

And that's how I got into AI security.

Javan Rasokat (01:52):

Yeah. Hey, my name is Javan. I'm also based in Germany, the same region as Rico, and I know him, I know Rico like five years ago or sometime. We work together in the same consulting firm and I'm currently working at Sage. Sage is an ERP accounting type of software provider. And I'm part of the application security team and I support our software engineering teams in securing the software development life cycle. So I started myself as a developer, then got some interest in pen testing, found some vulnerabilities, and that's how I ended up in full-time application security.

And now of course doing something with AI. Also very recent because in Sage for example, we also created our LLM integrated application also a type of Copilot, a chat bot, and that's where I took some lessons and that's how we teamed up with Rico, how we teamed up.

Javan Rasokat (03:00):

He is really from the breaker side, and I'm from the builder-defender side. And when Rico, it was like, when did we meet? I think about one year ago at a local OWASP chapter. So like, we have small meet up communities, so come together and have a drink and listen to a presentation, and we came to each other and was like, Hey, we should do a joint talk about his expertise, bringing the breaker side, my expertise in using some defense techniques. And then that's how we kind of created this talk and yeah, that's how I got into AI security basically.

Charlie McCarthy (03:42):

Yeah, that's interesting. And something that's so needed. Rico, you mentioned you've been a member of the community and a listener of the show for a long time, so you've probably heard us talk about the lot, a lot about the concept of like an MLSecOps dream team or an AI security dream team that brings together individuals from these different areas to help inform, you know, what we're building and how we can do that in a secure way, but also advise on ways to break it. Because a lot of times those two communities don't necessarily have visibility into each other's expertise. So kudos on you. Very interesting talk.

Let's dive in maybe to the breaking side first, and we will definitely get in more on the back half of the episode about how to build secure LLMs specifically, but also concepts that can apply to AI systems as a whole. So gentlemen, in your talk, you describe a number of real world ways that large language models or LLMs - as we’ve become accustomed to calling them for short - can be exploited. Things from direct and indirect prompt injection to poisoning training data. Can you talk a little bit first about prompt injection and the distinction between direct and indirect, and maybe explain for some of our newer audience members the difference between those two and some examples of each?

Rico Komenda (05:03):

Yeah, of course. But I can also say just watch all the eps or hear all the episodes, because I think it's gotten a definition of, oh, we could break it into like prompt injection. Of course I can say it, but that's not a problem. A direct prompt injection, also sometimes called jailbreaking, happens when a user fixes the system directly with the input with a hidden instruction, more or less hidden. And this can let hackers access things they're not supposed to, like private data or maybe force the model to like force harmful answer, maybe also hallucination, which could damage the reputation of our company or things like that.

Indirect prompt injection happens when AI reads information from outside the source, like not the input, maybe it's getting the information from a website, from a blog post or something like that. And inside the message, in the blog post, for example, there could be hidden instructions when it's getting searched and then the model takes action on the information it's getting from another source. As long as AI can read them, they can still work, but it could be unseen for us as a user. Javan, do you want to add something?

Javan Rasokat (06:28):

Yeah, I want to add something because I'm coming from the application security side now because now we have prompt injection and new vulnerability type and you know, I've been dealing for years with SQL injection. Which, if someone knows SQL injection, then yeah, it's just like we, again, we started to mix commands like the SQL query with user input and that is where the potential injection happens.

And to give a more concrete example from Rico's, like imagine you have a user input which tells the, in the prompt to the model, tells the model to ignore all previous instructions and do something else instead. So this was like, this is nowadays the injection type of, which is happening.

And previously with SQL injection, it was like adding additional queries, trying to extract database information from the database. And this will be later, we will discuss this a bit when we look at defending, because both concepts are similar. It's both any action.

Charlie McCarthy (07:37):

I would be interested to get the take from both of you related to prompt injection and indirect prompt injection or, you know, jailbreaking specifically. I think the industry is still moving toward trying to standardize our definitions of some of these types of attacks. Like we've had guests, people in the community, refer to prompt injection and jailbreak interchangeably. Some people have said that they are different, whatever we're calling any of this. I think we're still moving toward, like I said, standardized definition.

But in terms of the attacks themselves, including maybe poisoning of training data, what's your comment on how abundant these attacks are out in the wild or how big of a problem they actually are? Would you consider them to be more theoretical or academic, or is this something that you're seeing out in the wild more and more often at this point?

Javan Rasokat (08:31):

I would like to start. So what I think I noticed is like one year ago when this, or two years ago when this became a big topic, the severity of a prompt injection was I think it was a medium type of vulnerability. And what I noticed, like also if I look into the bug bounty programs, or the bug bounty programs for AI red teaming from Hackerone or from, there's one from Mozilla, 0Din. And I look at how they rate it nowadays is it's just a prompt injection, it's just a low severity.

So previously there was a different severity, so it's like we can't really mitigate it. So we kind of started accepting it and what the shift that I see is that we are really looking more for vulnerabilities that are around access, escalation abuse, and not just jailbreaks, because we learned we can do a lot about defending against prompting injects, and there are some, I call them AI firewalls. Some people might not like this name, but we can't fully prevent it at this state, how it is currently designed. So it'll be difficult to be fully proof against prompt injection. So it's like that's, I think that's the shift that I noticed. Yeah, but feel free.

Rico Komenda (10:01):

I just can underline that because, but it's like the lowest hanging fruit from an outside attacker. If you have a custom chat bot directly to the customer, then that's the first point you can attack. And as we all know, that's why we also emphasize in our talk that we should test systems, not models, because a model itself can't be a hundred percent secure and safe because of the underlying scientific approach of that.

And that's why we still see that prompt injection is going to be a big issue, but not solely because of that. But you can use a model or a customer facing generative AI case to get into access controls to leverage that component to other things like pen testing or malicious actions.

Javan Rasokat (10:59):

Yeah. Still the most important entry point. But like the bar, the bar just moved, like, can you, first of all, can you trick the model, like do a jailbreak or prompt injection? But now it's like, can you also abuse it to do something that matters for that context of the duplication? I think that's the shift, that bar that we noticed.

Charlie McCarthy (11:21):

Right, right. There was, I don't know if either of you saw this, it just came to mind. There was a really interesting article that just came out over the last week related to hidden AI prompts in academic papers that are sparking some... You did, yeah. You're nodding your heads. Just really interesting.

I think it's going to be a strange year as we see more and more, you know, what unfolds because I mean - [come on] it's an academic paper.

When you think about prompt injection or as we've been talking about it in the initial stages, it's like, okay, what are, what are attackers going to do? What's this really malicious thing that someone's going to try to do or get away with?

And then you have this wholesome stage of the university setting where folks who are just trying to maybe get ahead in school, and I don't know if the folks that did this you know, if it really is just to move forward their reputation in the academic world, or if it started as an experiment. I would like to do a little bit more digging to see what the agenda was for the folks that tried this particular type of [indirect] prompt injection, but it's just a wild world we're living in. So interesting.

Rico Komenda (12:31):

Definitely. Also, like, one of the scientific funny things which happened in the last weeks was also like the Claude obvious kind of paper where it went in, I can't grasp it really again, but it was like the Claude was the co-writer, of the paper where it, what was it about again, maybe Javan, you know it, I sent it to you because we both had a laugh about that. Then the author got destroyed on LinkedIn and also from other publishers about this paper. It's quite a...

Javan Rasokat (13:11):

But I think he claimed it was a joke. I think he...

Rico Komenda (13:15):

He claimed it was a joke but...

Charlie McCarthy (13:15):

I mean, what are you going to do at that point when somebody calls you out, you know, in a public forum like LinkedIn or something, you have to, you can either confess to it or be like, oh, it was just an experiment. It was just a joke, you know, like -

Rico Komenda (13:28):

But the author wrote a big blog post about it where he said that, like, you just wanted the tool for fun, and then it escalated. And then that's why I wanted to say it, because as you said, what happens if the scientific papers are getting press by LLMs for individuals, and that's what happened to him. It got into LinkedIn, it got into social media, and people just drew them, I think it had like 50 pages. And people just wrote that into ChatGPT or Gemini, made it summarize, made some valid points against it, but somehow something's wrong and then it got escalated and went viral on LinkedIn and other social media pages.

Charlie McCarthy (14:32):

That's so interesting. I hadn't seen that one yet. I will find it though, and include it in the links to the show notes for this episode, because that's one that I'm sure the audience would be interested in looking at as well. Okay. Let's take a peek here.

Back to breaking LLM integrated apps. Kind of just wrapping, wrapping up here. Related to your presentation, there was a question that we wanted to ask about, you know, imagine attackers overturning decisions in an LLM powered, maybe refund system, by tampering with evidence. I think that was one of the examples given. Can you talk about what that says about trust boundaries in these kinds of applications?

Rico Komenda (15:12):

Yes. This was like the demo we showed at this talk. Which for those of the listeners, we will also be at OWASP LASCON later this year. So if you're there, catch Javan and me and we will have a build together in Austin, Texas. I'm kind of hyped for that. But back to the question, yeah.

So like it showcased, if we go back to threat modeling these kinds of applications, then we see that zero trust should still apply. If we give the model itself the autonomy to make a decision, which could be business critical, in this case we found a fun product, for example, if we just check if the model or if the prompt through the model will be fine, then we can be pretty sure that it will be overturned. If we put other security controls in place, it would likely not be happening.

As earlier said, as it's a non-deterministic system, we never can a hundred percent secure it. Also like data poisoning, we can't validate if the model has malicious data points in it, so therefore we need rigorous testing and human in loop approaches to like mitigate that kind of thing. But we should be careful with the autonomy of models which are based on a narrative case.

Javan Rasokat (16:42):

Yeah, I agree with the human in the loop. And what I like about this question, because you ask about trust boundaries, and when I hear this word trust boundaries, I think about threat modeling, which is, in my world, in application security, an important thing. And we draw those threat models as trust boundaries across our data flows. And like Rico said if you give a model or an agent access to certain actions, we should still verify agents have the permission to access that certain data set. So we need our access control as usual as we would design an application. Yeah, that's my take on that.

Charlie McCarthy (17:22):

Right. Excellent. Another highlight that was called out within the presentation that I thought was a fantastic point and that maybe a lot of folks don't grasp the concept of yet, but starting to is that the attack surface for AI isn't just the model. It might not just be the LLM, it is the entire system. So data pipelines, embedding stores, APIs, potentially even browser output. Can you talk a little bit with this audience about an example of where vulnerabilities lie outside of the actual model itself?

Rico Komenda (18:00):

The question is we are not so yes, that's true. I said all the time in regard to security testing AI systems and applications, as I said, I think twice now and always my first time, test systems not models. If we look at how these generative AI models are implemented right now, then we can see that it's not solely in our own application. We have surrounding components which are working with the model itself.

So the business case should be relevant and we should test all surrounding or all components inside of this application. The model is interactive with our software parts. We could say we use an API, we use a backend, which is based on maybe Python, maybe Java, maybe on .net. But also like if you look from the other perspective, what can I see? There's still a webpage, a website, this could have some flaws in it.

Rico Komenda (18:59):

We could look into maybe we have an integration to an inference. What can we do there? We could also look into the internal perspective in regards like software development lifecycle and architectures of software. We have a vector database. What is happening there? We have data pipelines. If we like fine tune or have a reinforcement learning type of system, we have third party dependencies. So as I said, where could not be hidden, some kind of vulnerabilities. So we need to test the whole system. It's most effective if we surround it with a business case and look at the whole software application architecture.

Charlie McCarthy (19:43):

Yeah. And this becomes even more critical as we move into the age of agentic AI, right? And when you think about agents and all of the different systems that, I mean, parts of your system that they'll be connected to, so much more exposure and the blast radius expands almost exponentially. So that'll be interesting to look into as well.

Javan Rasokat (20:03):

It's getting very complex now. Like previously we had OWASP Top 10, now we have LLM Top 10, and I was like, should we have another LLM Top 10 for agents and now MCP? Another top 10 for MCPs? Who knows?

Charlie McCarthy (20:19):

Yeah. There's a lot of, well, they're not, I don't want to call them buzzwords because they have weight and meaning, but yeah. Lots of different terminology to consider and like all of these different components that keep cropping up. So I'll toss another one out there, RAG. Retrieval augmented generation [RAG], something that a lot of folks have probably been seeing and hearing about over the last year specifically.

Can you touch on, and maybe for some early learners within this community or less technical members of the community - like say you're trying to explain it to a CEO or something - what RAG is and how it might play a role in some of these types of attacks against LLMs in their systems?

Rico Komenda (21:08):

Do you want to do that or should I? For me, it doesn't matter.

Javan Rasokat (21:14):

No, it's fine.

Rico Komenda (21:17):

Okay. Like RAG stands for retrieval augmented generations. You could say it's like taking some data from somewhere other than the model, or information, and interpolate it within your prompt. For example, if you are using a vector database, okay, that's a little bit technical. If you're taking a database or like a pouch and you want to get some information out of this pouch and then it's getting indexed on this pouch.

Like example, okay, I want to get my pen out of this pouch, then you can achieve that information and interpolate it with some good sounding text around it. RAG is, to get back to the question, it's also like a part of the attack surface. If we are, if we're being honest, it wasn't part of the first OWASP Top 10 LLM, it got integrated in the second one.

Rico Komenda (22:15):

And the newest one for 2025, it's part of I think the eighth one, LLM08, called Vector and Embedding Weaknesses. Because as said, you want to embed some kind of information, which is then the augmented generation, and now we could like swerve back the last five, six minutes.

Also we need to check that trust boundaries. It's another component beside the model. And then we can get that maybe a more customer or like more technical phased example would be if you want to like, get information from your company SharePoint or some knowledge base and want to get that into your custom GPT for example, this is also called, could be a use case of which will augmented generation.

Charlie McCarthy (23:15):

Got it. Thank you. Okay, last question around breaking LLM integrated apps, and then we'll move over to more of the like building and how does securely build them. Something else you both do a really good job of emphasizing in your content is that traditional AppSec tools struggle with non-deterministic systems like LLMs. Can you talk a bit about why this is such a big problem for security teams and what does it break in the way that we currently test systems?

Javan Rasokat (23:50):

So the secure software development lifecycle is a lot about shifting left, doing stuff early in the IDE, maybe early in the, in some CICD pipelines where code is written that you run a static code analysis on the code and then it'll pop up some findings. And we have those different tab steps that we do during our traditional SDLC.

But what we noticed, because of LLMs, how can we test LLMs? How can we test their behavior, their output? And what we see is that a lot of this testing actually needs to happen in the runtime environment. So we need a deployed application, we need a model, we need and it's so usually we have this shift left and we have static code analysis and so on. We have this existing tooling, which is all about code vulnerabilities, but now how can we integrate testing for LLM typical vulnerabilities in our traditional tool stack.

Javan Rasokat (24:50):

And that's, we have one category for that in application security called Dynamic Application Security Testing called short DAST. And that's it. Now I think everything which covers that needs to go into DAST because of course we have the traditional activities like the threat modeling, which happens early in the design phase. We have the activities that we write down, misuse and abuse cases, like the worst things that we can imagine, what could happen with our application, what could happen wrong, and when we think about threat modeling, those threats.

But this stuff can still be applied with thinking about LLM integrated apps but a lot of testing around the actual vulnerabilities this needs to happen on the runtime. So there's this huge, huge focus on fussing different payloads, which needs to happen on the runtime aspect.

Rico Komenda (25:49):

Just to underline that, I really liked a sentence from Javan at Barcelona, and he said like, for a long time it was thought that DAST was dead, but exactly with that. And now with generative AI components, it's getting alive again because now we can use this kind of runtime application security testing and use the kind of tools we already have and use it for now really, really good use case.

Javan Rasokat (26:21):

That's true. There's a lot of articles about the DAST. So this type of scanning is a dead control. No one really uses it, but with this new thing it got a new importance. And now there are also like free open source projects available where you can do some end-to-end testing. There's, for example, Gavrog, I think Cheese Guard is another one which is all about testing or scanning the LLM for those scanning in terms of sending different payloads to see the behavior.

And then there's another problem, which we have in testing those applications is, as we said, it's non-deterministic. So we never have the same response, the same reply twice. So, it's really difficult. So we need to repeat our test because the prompt injection might fail once, but maybe another test, it'll work. So it's difficult of course to test.

Rico Komenda (27:22):

One thing I would love to see is a kind of cascading testing suit with DAST, because like if we're getting into complex architectures with agent based applications it would be kind of interesting to see the cascading effect of like the fussing and the rigorous testing for this fussing kind of kind of fussing tests. So this will be the next topic. And I think with that I have my next thing I will deep dive into.

Charlie McCarthy (27:57):

I was going to say it sounds like you have a project now.

Rico Komenda (28:00):

Yeah, I have a project now. So if any of the listeners has any interest in that just bring me on LinkedIn or MLSecOps Slack.

Charlie McCarthy (28:06):

Reach out to Rico!

Rico Komenda (28:06):

Yeah, I would love that. Sounds really, really cool.

Charlie McCarthy (28:13):

And excellent transition into our next topic. Well done, Rico. Yeah, if we talk a little bit more about building secure LLM apps and specifically diving into like developer responsibilities and AppSec mindset. I know Javan, you talked about it brings you back to thinking about threat modeling. Some of the things that we do when we shift left on the front side as these systems are being built. You both have emphasized shared responsibility when it comes to building secure LLM systems. So that could include developers, platform engineers and even thinking about the end users. AppSec basics obviously still really apply in this situation, but can you talk more about what it looks like in context? So maybe start with talking about documentation how you see the roles of developers evolving when it comes to securing these systems?

Javan Rasokat (29:08):

Yeah, happy to take this one first. So, there's a big document from Meta actually about a shared responsibility. So they call this document Meta's Developer Responsibilities. It's like a guideline, a use guideline for building apps and element integrated apps, especially if you use one of their free models. And so they called, actually what we noticed, what we already know from cloud providers, this terminology of a shared responsibility.

So they, like, now have a document which says, okay, if you're using our models, you should do AI red teaming. You should have some kind of firewall trying to block lists, typical stuff like prompting checks. So they really declare some activities that you should do. And this is interesting because now, let's take a look at OpenAI, for example.

Javan Rasokat (30:08):

Because OpenAI has what they call this in their documentation, OpenAI Safety Best Practices. And that's also similar. They also mentioned you should do AI red teaming on your application. You should do moderation and filtration. And OpenAI provides also this moderation API, where you can add an additional check, like if there's some unsafe prompts in my application, like is this a safety mechanism?

And similar to what Meta released a couple of weeks, months ago with the Meta firewall LlamaFirewall. It was called LlamaFirewall, but so really similar but just an API based from OpenAI. So both of them provide you with practice. They tell you to use them. And so if you build an application, of course, you should apply them. And one interesting take, which I saw in the OpenAI best practices, this is kind of a funny one, is they say you should use validated inputs, you should use secure dropdowns and trusted sources.

Javan Rasokat (31:22):

So if you think about the application, if you have a chat bot and everyone can't type in the question and chat with the chat bot, this doesn't make really sense, but it's in the end, it's the ultimate way to get rid of input prompt objection, because you don't have user input. Even, not really, because you still have data from other sources maybe that you pull into. So yeah, then you still have indirect prompt injection. So it's yeah, what are trusted sources where there could be maybe a hidden instruction somewhere, like we had this example from the academic paper where they place hidden instructions. Yes, I think there are lots of shared responsibilities now. It's like, this is interesting, similar to what we knew with cloud providers.

Rico Komenda (32:12):

But that's also the thing, why I really liked the MLSecOps approach, Diana [Kelley], like with the MLSecOps dream team I quite like, because you're not just looking at the developers. You're not just looking at like the people who are providing the system components, but you're looking at the ethical implications. You're looking at the end user. You're looking into how all of those people inside of a company and the people who are working with the system or for the system have a take on that. It's not stopping with just the developing case in that part of the software we are working right now with.

Charlie McCarthy (33:00):

Yeah, absolutely. And you're bridging some of those knowledge gaps, like we talked about before, between the different teams and providing the information or context to other teams that they'll need to help determine whether or not a system is going to be the most secure possible. Javan, I wanted to pop back to a phrase that you used, AI firewalls.

There's been a little bit of debate over the last year within the community about what even is that? Like, what is the definition of an AI firewall? And then also can you talk about what, by your definition, what an AI firewall does under the hood? I know you mentioned before things like prompt injection, you know, it's not a solvable problem. So our firewall is just patching around that. Dig into that a little bit for us.

Javan Rasokat (33:47):

No, you are right with terminology of course and the digging around it. So AI firewalls, for me, it's like having a web application firewall. And when we look at web application firewalls, you can see how they behave. They have some rules, some regular expressions that are to check for typical strings like cross site scripting attacks. So, I see those AI firewalls, how I call them, or we can maybe mention a product or another product or a project which is out there.

So there's for example, the LlamaFirewall, prompt guard from Meta, but there's also a LLM Guard from Protect AI, which is scanning both output and input. And in the end, they work like a web application firewall type of stuff. So you place it like a gateway in front of your LLM and then they try to scan the input, which comes into that firewall and try to take some misuse cases.

Javan Rasokat (34:50):

Like someone's asking the model how to build a bomb. So it's not only scanning for jailbreak type of attacks, but also safety issues, concerns. There are a lot of other aspects about safety and how a model should respond to certain chats. And what we know from web applications firewalls from us, in our space, which we had for many, many years. So this is a very old concept, is that they are not bulletproof. They are easy to bypass. You just need to check on Twitter for bypass for, yeah, so it's very easy to bypass. And I think that's similar to what's happening with those LLM based firewalls. If you also check on Twitter, you can find some jailbreaks sometimes. And the community shares each with each other, those types of jailbreaks.

Javan Rasokat (35:48):

And of course, they're not as simple as a web application firewall that just checks for for regular expression, but they have some more, they have folks like capabilities, for example, the LLM firewall might be trained on like, might be its own machine learning model trained on malicious inputs to detect those inputs. So it could be a fine tuned model, just better costs maybe. And yeah, it's a defense, it's an additional line of defense. And what I like about it is that we have this, so I'm not saying it's worthless. I think it's still important to measure. I'm also deploying the buff even if I know it's bypass possible, it just makes it more difficult. It's a, and when I think about one year ago where we did not have those projects, like LLM Guard from Protect AI or Meta, the LlamaFirewall, it was like we had to build this ourselves.

Javan Rasokat (36:49):

So it was really challenging keeping up with this fast evolving development. But now I can just say, oh, I did enough. Basically I'm using this firewall, same as I say, for example, about more securities, for example, a famous valve provider. So I'm saying like, I have more security with this paranoia level deployed, and then I'm like, I'm good.

So it is like now we have benchmarks basically for, and it's great to have this competition about defending against those types of attacks or safety issues because now we have multiple projects which are benchmarking with each other against. So I like this development for me as a developer can, it's easier, it's maybe even cheaper than trying to build something my own. And finally we have some benchmarks because I was always asking ourselves, are we good enough?

Javan Rasokat (37:48):

Like did we do enough to, when we try to do block listing and so on, because it is difficult to keep up. So it's interesting paper being published was and this, remember me about the, the stuff when we discussed about SQL injection was, you know, there used to be like the defense number one, when we talk about SQL injection is used to, you have to use prepared statements because this gives you, this makes sure that you never mix business logic with user input. This cost is separated, so you should use prepared statements when you build your SQL query. And now there's this research, for example, one from stanfordnlp. They have a project called DSPY. It's just also in GitHub and is a very interesting approach because they started separating logic and input.

Javan Rasokat (38:46):

So with your prompt, basically. So this is a very interesting approach. It's not production ready. But it's nice to see that finally, someone started thinking about how you can separate both because then you get rid of maybe some of the prompt injection cases. Like we solved in, you know, we solved SQL injection because we now have prepared statements and maybe we can do the similar by redefining how we interact with our LLM by yeah, maybe splitting both a bit better. And that's also another very interesting article from the Google DeepMind team. So, this was called "Defeating Prompt Injections by Design," which is also discussing, yeah, basically how you can design your application or your input model, your input to separate the logic and not mix this with instructions. It's very interesting.

Rico Komenda (39:50):

If we're talking about standardized terminology, I would say we call it prepared prompts.

Javan Rasokat (39:58):

Yes.

Charlie McCarthy (39:59):

Prepared prompts.

Charlie McCarthy (40:02):

That's a little bit hard to say. Prepared, prepared prompts. Yeah, I can't even say it right now, but cool. Yeah, and thank you for...

Javan Rasokat (40:10):

You're making history maybe now.

Charlie McCarthy (40:13):

Yeah, we said it here first. Remember this! That's super cool. I will, to our listeners, include links to those resources and articles in the show notes as well. Thank you so much for sharing that. Okay. So your presentation gentlemen also ends with a revamped secure SDLC that I think would be cool for us to dive into for a moment. And for those who are not familiar, although I'm sure if you're listening to the show, you probably are, SDLC stands for Software Development Life Cycle and in the presentation it's tailored for AI. So what would you say is the biggest security shift or shifts that teams need to make in order to adapt to a secure SDLC tailored for AI?

Javan Rasokat (40:57):

I think the most important shift is that we have a lot of testing on the runtime in a dynamic environment. One more thing would be also the big focus on what we previously called the penetration testing step in this phase, because now this is AI red teaming but again, with a focus on the whole system.

Another important aspect is the runtime monitoring. I think this didn't happen that much or the quality is not that good. And I mean in traditional applications, but now runtime monitoring is very important. Yeah, you need maybe to even check the output of your systems and have some, some red flags that alert you kind of, and also like other threshold that you might reach, maybe also costs because it's very expensive sometimes, so rate limiting. And yeah, I think that there's a runtime aspect and monitoring is a very big shift. Rico, do you think anything else?

Rico Komenda (42:15):

No, I think you said it pretty well. I think AppSec as earlier said by Javan said in the last year's shift left, I think we are moving now to shift up or shift wide because we are looking into the whole system again, and not just like the developer accuracy or the developer responsibilities, which would be the first case in traditional application development. But right now, as we have reinforcement learning types of systems and like dynamic fields for these applications, we need to shift more about the whole system and interact with it in real time. That's what Javan, you said it really, really good. Run time monitoring, run time scans, run time tests. I think just some summarized it pretty well.

Charlie McCarthy (43:10):

Yeah. You also alluded to the idea, Javan, that the industry has moved toward the adoption of the phrase AI red teaming in substitution of what we may have called traditional penetration testing in the past. And I am curious, that's another hotly debated term that has been over the last year within the community. Do you agree with many of the definitions you've been hearing for what AI red teaming is, or would you like to add any fresh perspective? I mean, talk a little bit about that. Like what is AI red teaming to you?

Javan Rasokat (43:42):

Yeah, it's interesting because it's already a standardized term because basically OpenAI is using this term in their safety practices. Meta is using this in their developer's guideline. So it's like yeah. Now I would finish this debate and say we already agreed there is a new activity called AI red teaming.

Rico Komenda (44:04):

I think it was first introduced by some bill in the US right? I think there was like one bill in the US which said, okay, it's called AI red teaming. Then every company jumped on that and called it AI teaming.

Javan Rasokat (44:21):

But, when you asked me about the difference between the traditional pen testing I'm struggling because for me it's, when I look at the results that we see from bug bounty reports, for example they are really similar to any other findings that we usually get from any other type of testing. Like in the end, you know for example, thinking about one example, which I saw in the news, it was GitLab Duo, also like a chat bot for your code repository by GitLab.

So they had a vulnerability, of course, prompt injection was the entry point but in the end, the exploit happened with a HTML injection type of injection. And you can see this mix of our traditional vulnerabilities that we have in our checklist for penetration tests. And so there's, it's, for me it's maybe sometimes interchangeable.

Javan Rasokat (45:22):

Also it's just maybe if I have an application where I know it's specific about how it does integrate with LLM, then I'm just asking for experts, which has some experience with testing AI systems, which know about jailbreaks and so on. So it's maybe getting the right testers with the knowledge of how AI systems behave, but not about just testing methodology. So AI red teaming is, maybe the para does include also all the pen testing checklists type of work. You know, that the checking for cross scripting type of attacks is still relevant for, because we test the whole system and not just the LLM model. It's, yeah.

Charlie McCarthy (46:07):

Fair.

Javan Rasokat (46:09):

Fair.

Charlie McCarthy (46:10):

Yeah. Okay. In the interest of time, I hate to do this because I could sit here with y'all and wax poetic for a while longer, but I'm going to wrap up with maybe two questions for each of you. And it's the same question. First one is, what in the future are you most concerned about related to AI security? What's top of mind for you? Is it a particular threat? Is it a pattern you're noticing? So what keeps you up at night in the land of AI security? Or what do you think about the future?

And then the follow up piece I just wanted to call out that you're both contributors within the OWASP community, thank you for your work. And then maybe you can talk about advice for teams that are trying to operationalize some of these OWASP frameworks that are coming out or get involved with the various groups, Javan, that you mentioned, like LLM top 10 now I think they've augmented that to just GenAI top 10 and then the agent work. So future and OWASP, and I think we'll call it a day.

Rico Komenda (47:15):

In terms of how the future would look like. I think we are at a good point at grasping the terminology and how the baseline of attacks are working. So I think, and this is the pattern I see in the last few weeks and the few months last few months, is that we are getting more on an autonomy based application architecture, and I think we will see more and more cascading attack styles where we are just not talking about prompt injection, but also about all the other data points it could grasp.

One thing I'm really interested in is the memory poisoning kind of attacks where you have now memory based attacks or you can manipulate the memory and this will be a fun thing to see how this will work with short-term memory and long-term memories in these kinds of agent based applications.

Rico Komenda (48:17):

So this is what I will think is the next big issue besides prompt injection. Like in the past it was like, just how are we being persistent with applications. Now how are we being persistent with AI applications? So this I think will be the next things which will be drastically explored and exfiltrated. Regarding the OWASP question, there are many things happening in the OWASP community. I was part of the GenAI Red Teaming Guide 1.0, which was published in early or mid of January. We are now working on a testing, or GenAI red teaming testing implementers handbook, which will be in Q3 or something like that. But they are also happening in the GenAI community. But also besides that, there's like the AI security verification standard [and] the application security verification standard.

Rico Komenda (49:36):

Also there is happening an AI maturity assessment, which is rooted in the principle of OWASP SAMM, which is based or is called software assurance maturity model. I think this will be really, really helpful for companies to see where we stand with the AI maturity in our own company. Therefore, you can do a questionnaire and then you can see the stand where you're at. Like maybe I forgot the English word. Is it maturity? Yes, I think so.

Also pretty published or in the works of being published is the AI Vulnerability Scoring System [AIVSS], which is also led by one of the people who were in the MLSecOps Podcast, Ken Huang, and which will also have great impact into the AI security landscape. So I'm looking forward to what's still happening there, but it'll be pretty heavily in Q3 and Q4, which is going to be published and can be used for personal or company based users.

Javan Rasokat (50:52):

I think you covered all of the contributions and developments in anything that is happening. I will just do my concerns. My concerns right now is what we spoke about previously is also, like previously in the early days, everything was kind of a security issue, you know, prompt injection, extraction, some jailbreaks. But now we've always think about is there actually exploitable? Is there harm? Is there, you know, what's the impact on us? So we kind of accepted some trade-offs because we haven't really solved the issue, the prompt leaks, the lead system problems. We haven't solved this type of issue. We just reclassified them, maybe change the severity type of them.

And so this is my concern that we are spending our focus, I wish we would focus a bit around that research, which is happening right now. And it's the difficult, more difficult part and obviously this other stuff was, like the LLM firewall type of application, was easier to make, to build before solving the root cause. But I hope that in the future we finally find a way to solve those issues by design. And then yeah, that we do not have to reclassify stuff instead.

Charlie McCarthy (52:24):

Makes sense. Okay. Well I can't thank both of you enough for being here today. Really, it was an honor and a pleasure to dive into some of your research tip of the iceberg. I'm sure hopefully we get to have you back in the future to talk more. As I mentioned, we will have show notes for the show. Is there, what's the best contact means of contact for folks that would like to get in touch with you to learn more? Is it LinkedIn?

Javan Rasokat (52:50):

Yeah, for me, LinkedIn.

Rico Komenda (52:51):

Yeah. Both of us are LinkedIn, but also as we are in the MLSecOps Community, you can also message me on the MLSecOps Slack channel.

Charlie McCarthy (53:02):

Fabulous. Yes. Alright. To all of our audience members, thank you so much for being here. You make this show and you make it a thriving community. Thank you again to our guests and our show sponsor, Protect AI, and we will see you all next time.

[Closing]

Additional tools and resources to check out:

Protect AI Guardian: Zero Trust for ML Model

Recon: Automated Red Teaming for GenAI

Protect AI’s ML Security-Focused Open Source Tools

LLM Guard Open Source Security Toolkit for LLM Interactions

Huntr - The World's First AI/Machine Learning Bug Bounty Platform

Thanks for checking out the MLSecOps Podcast! Get involved with the MLSecOps Community and find more resources at https://community.mlsecops.com.

View full post