How Red Teamers Are Exposing Flaws in AI Pipelines

Jul 9, 2025 By Guest

Audio-only version also available on your favorite podcast streaming service including Apple Podcasts, Spotify, and iHeart Podcasts.

Episode Summary:

Prolific bug bounty hunter and Offensive Security Lead at Toreon, Robbe Van Roey (PinkDraconian), joins the MLSecOps Podcast to break down how he discovered RCEs in BentoML and LangChain, the risks of unsafe model serialization, and his approach to red teaming AI systems.

Transcript:

[Intro]

Madi Vorbrich (00:08):

Welcome to the MLSecOps Podcast. I'm your host Madi Vorbrich and one of the MLSecOps Community Managers. And today I have with me Robbe Van Roey [Robbe]. He's also better known as PinkDraconian online. He works for Toreon as the Offensive Security Lead. Robbe is also a veteran CTF champion, a prolific bug bounty hunter. He has his own YouTube channel, so he's a content creator as well. The list goes on and on, truly. Robbe, thank you so much for joining us on the show.

Robbe Van Roey (00:40):

It's my pleasure. And thanks for the great introduction.

Madi Vorbrich (00:43):

So to kick things off, I wanna go ahead and just dive in and give our listeners insight as to what you do. How did you start this journey? I know that you kind of dabbled in CTFs at the beginning, did bug bounty hunting, then pen testing. So can you guide us through that journey?

Robbe Van Roey (01:02):

Yeah, I started about eight years ago when I was 17. But I still remember when I was 15, 16, I used to watch videos on YouTube of people hacking. I used to think, whoa, this is magic. You know? And then I started to dabbling in coding a little bit, building my own websites. And I, pretty quickly, learned that building stuff is really hard. Building stuff that doesn't break is even harder.

And I was getting so good at breaking stuff that I figured, well, maybe I should continue, quit building stuff and actually start breaking stuff for real. And so that's how I kind of dabbled into security. It started out purely with CTFs. Like for the first two years when I started, I played CTFs every single weekend, every day of the week.

Robbe Van Roey (01:50):

I remember getting home from school, like running home from school, just so I could do some more hacking on CTFs. And that was all super cool. I learned a lot of things, but then I had to, of course, go into the real world, and I had to find vulnerabilities in real applications. And I was incredibly scared of this because, well, CTFs, they're built to be vulnerable. Real applications are built to be secure.

But then I had a company who gave me a shot and then gave me one month to pen test their network. And this was my first real engagement. And so the first day I walked into this office of the company, I opened my laptop, and two hours later, I'm the domain administrator. And, you know, I've found the entire company and I go to the CSOs office with a full list of all the password hashes of every employee. And that adrenaline rush, that was just so, so, like, I needed more of that adrenaline. And yeah, from there, I kept going. And I pretty quickly learned that hacking real applications is actually much easier than playing CTFs. So that was a great thing to learn.

Madi Vorbrich (02:59):

Yeah. And now too, you also have your own YouTube channel where you can teach others about discoveries that you've found, which I'm definitely a fan of too.

Robbe Van Roey (03:08):

Yeah. Thank you very much. Yeah I, like in the hacking space, being a great hacker is of course not an easy thing to do. But also being able to explain what you've done to maybe non-technical people or others is very important. And so I figured, why don't I make a YouTube channel that I let the community benefit from? Because the community can learn. And I also benefit from it myself 'cause I, well, I teach myself how to talk about vulnerabilities that are very technical.

Because the first time I ever had to go up and explain what I did, like in that same company to that CSO, I had to explain how I became the main administrator. I was struggling 'cause technically I knew what I was talking about. But how do you explain something like that to somebody who doesn't know the technical details?

Madi Vorbrich (03:55):

Right, exactly. And like you said, the more that you talk about it openly too, like even in conversation, the better it sticks and the better that you learn. So, Robbe, I wanna go ahead and dive into some of your most recent vulnerabilities, some of your recent biggest discoveries that you've had. And then with that, that'll kind of guide the rest of the episode. So I wanna just dive into those, kind of see why these keep popping up since they seem like relatively common, which is kind of scary. The techniques that you also use to uncover them as well.

And then maybe flip it and kind of talk about any steps that teams can take to also secure their pipeline as it relates to stuff that you've uncovered. I mean, like you said, you just walked into this company and you became like the domain you know, administrator, and then that was it, which is crazy.

Robbe Van Roey (04:46):

Yeah, exactly. One of, like about a year ago, I wanted to do some research into a new field. And, you know AI was really popping off. It was everywhere. And so I wanted to look into AI systems, frameworks, orchestration frameworks, and so on to see how duty systems work. And over the course of last year, I found 30 CVEs in various different AI libraries. And that really opened my eyes. And I quickly learned that all of these new, like these new technologies that come out, they're built with incredible speeds because we, you know, we want to please investors have something really quick that's working. Oftentimes in the beginning, they're also built by very smart AI researchers, but not software developers. And AI researchers are very great at, you know, designing these incredibly complex systems that I could not even begin to comprehend.

Robbe Van Roey (05:43):

But they don't have a history of building secure applications. And I really quickly learned that very simple security mistakes can often just come back. And so, a lot of my focus was to look at these systems and to see the mistakes that we have learned from in the past, the issues that we already knew for a long time. How do these occur with these new systems and these new ways that this system will talk with the underlying operating system and the file system and so on.

Because the thing with these AI frameworks, orchestration frameworks, inference servers, they all require a lot of interaction with the operating system, with the file system, in interesting ways. And one very interesting way is in the way that models are actually stored, transferred across the network to somebody, and then eventually loaded into a system.

Robbe Van Roey (06:42):

Almost everything in AI is built in Python. And in Python, if you wanna send an object or something to somebody else for them to use, then you will pickle it. And pickling is serialization. So what does that mean? Of course, this model is trained on some super great computer with lots of GPUs. That is great at training the AI, but then that model all the data, all the weights, that whole object needs to be sent to the clients who, or the server who's gonna actually use that model and interact with it. And we use serialization to kind of send that model over a network.

Now, pickle is actually not very good, not a good fit for this purpose. I mean, it's created as a tool for serialization and deserialization, but it's also incredibly vulnerable by design. If you are able to send somebody a model that's poisoned, then you can pretty much just execute arbitrary codes on that system.

Robbe Van Roey (07:43):

So if I send somebody a model and they import it in their system, then if I had to reduce methods, would that automatically get called and my codes will be executed when they try to import my AI model. And that's kind of the way that for the last six years, it's how we interacted with models. It was a risk that that just was there, but of course not everybody is aware of that. And I started looking in a library called BentoML.

BentoML is a very interesting library. And I started looking at it. At first I was actually looking for a CSRF vulnerability in it 'cause I was, I really liked those vulnerabilities. So I started looking at the different ways that you can actually send data to this Bento, BentoML library. Like you have JSON that you can send data across, you have maybe HTML forms or different ways of sending data.

Robbe Van Roey (08:37):

And then in the source code, I found that they accepted a content type that was titled something like BentoML/pickle. And I thought that's very interesting to see here, because if that is what I think it is, then that means that I can just send pickle data to the server and the server will unpickle it and use it. Me knowing that pickling is very bad thing or unpickling arbitrary data is a very bad thing. Of course, I know that I can just send any object I want to that server and that server is then gonna unpickle it and execute my code. So the proof of concept for this vulnerability is literally just if you have network access to this BentoML instance, then you can send a request and that request will automatically cause remote code execution on that machine through this pickle.

Robbe Van Roey (09:31):

And it is so interesting because I imagine that the person who built this thought it was, you know, a great way of interacting with the system because, you know, pickle is used everywhere in AI. So it's used for transferring super large models by big companies. So why shouldn't we use it, you know, to accept data into our server? But in the end, well, it turns out to be not the greatest decision and led to a pretty big bug that luckily was discovered before, any knowledge of bad guys abusing it.

Madi Vorbrich (10:03):

Right. And with instances like these two, is this pretty common? I mean, case studies or bugs like this, is it becoming increasingly more common to find, just like with your experience over the years?

Robbe Van Roey (10:18):

Something as simple as this where the server literally accepts the data? I've never found, so luckily that is not too common.

Madi Vorbrich (10:26):

Good!

Robbe Van Roey (10:26):

But if you look at, yeah, it's very, very good because it's really dangerous. But look at AI frameworks as a whole, a lot of the bugs definitely in the beginning were almost always related to pickling data. Like any AI framework that would allow somebody to upload a model and then interact with it would be vulnerable by default, just because the community hadn't really made a global fix for the way that models are transferred. And in the beginning, that was kind of all the book reports you would see on these big servers, on these big libraries, these big frameworks. They would all just be the use of the pickle format for some arbitrary data.

Robbe Van Roey (11:12):

And then that was very dangerous. And luckily nowadays we can kind of talk in the past tense about this because slowly we are getting to a point where the solutions exist and are massively being implemented. So that is great to see, like one of the efforts is Safetensors, for example. Which is also just a serialization format or an incredibly fast one at that. But it doesn't have this built in remote code execution vulnerability. So you can actually if you import some malicious model, it's not gonna immediately cause remote code execution on your system. Of course, importing malicious models can still have some other issues, but it's at least not as big of an issue as literally having remote code execution on your system without you being able to do anything about it.

Madi Vorbrich (12:03):

So, Robbe, for your next writeup that I wanna kind of dive into you actually covered this recently on your YouTube channel. I also saw it on some of your socials as well, but it was the LangChain path traversal RCE. Can you kind of just walk us through that, like what made this bug possible and then also too, what does it teach us about hardening these LLM dev frameworks?

Robbe Van Roey (12:32):

Yeah, so these LLM dev frameworks, they have a lot of features and nobody really knows what they need from an LLM dev framework. Like what does it need to do? So they're stuffed, filled with features that, you know, anybody could imagine, maybe somebody would find this useful, so we'll implement it.

And I was looking in the way that LangChain imports models and chains and, and how it kind of interacts with them. And I found that they had this LangChain hub, which was a GitHub repository that stored JSON files. And these JSON files contained information about how an LLM should behave, like, should it be using the OpenAI APIs and what kind of prompts should it be sending, what's the API key, what's the the URLs that we should be sending the queries from the users too?

Robbe Van Roey (13:26):

And that's of course very interesting because perhaps I could make my own GitHub repository that would allow me to host JSON files that I could have get imported somewhere and it could then do malicious things. So that was my first goal. But I noticed that in the backend, I had a lot of checks that was really trying to confirm that these JSON files were only loaded from the official LangChain hub. They had some Regex checks, some really complex Regex that eventually would get the file from that specific GitHub repository.

I tracked down exactly how it worked and what it checked, and I found that it was not really in use anymore. The LangChain hub was kind of deprecated, hadn't been updated in a while, but this was still implemented in the main function for loading chains.

Robbe Van Roey (14:20):

So if somebody was allowed to load a chain and they could still get to this code and execute it, even though it was, didn't really seem in use anymore. But then I found a path reversal issue that allowed me to make a payload that fully, that the Regex was fully okay with. So all of the checks were fully okay with it, and that was because my payload would actually start with a fully normal expected behavior, but would then just add a lot of dot dot slashes at the end. And what that means is that if the GitHub PRL is then made to actually fetch this JSON file, it's gonna be github.com/langchain/langchainhub, and then ../../../ it's gonna traverse a path all the way back to github.com. And then I add /pinkdraconian/proofofconcept and then my JSON file that I want to load in.

Robbe Van Roey (15:11):

And that's allowed me on servers that allow me to, you know, load a custom model it would allow me to load some JSON file. Now, this JSON file, it couldn't do a lot necessarily, but it had some impact 'cause one of the biggest impact out of it initially was that I could override a server that it's going to call to make requests. So it would, if you type something in the chat, it would go to OpenAI, make an API request with the OpenAI API key and that way it would get an answer from OpenAI. But with my JSON file, I could actually override the URL, it was fetching, so it would actually fetch from attacker.com or my own website, and I would then in the logs see the OpenAI API key. So that was some nice impact.

Robbe Van Roey (16:01):

I liked that, but I knew there should be something more in this system, like I can control exactly what is happening here. And then I found that LangChain also has experimental features, and they are kept in a separate repository, which is a great move. It's great to put those in a separate repository so that they're not fully imported and in use. And for LangChain, it was actually a really good move because the experimental features allowed a lot of weird stuff.

For example, they had a specific preset for an AI that would allow code execution, like it was literally called AI underscore Bash or something like that. So it would take whatever the output from ChatGPT was or whatever AI you're using, and it would execute them as a bash command. Of course, that's a terrible thing. Nobody should ever want to do that with user inputs, but that's why it was in the experimental repository.

Robbe Van Roey (16:55):

But then I found that actually if you try to load this malicious AI preset without the experimental features ever being imported, but just being installed on the system, so maybe the developer played around with the experimental features, but never actually decided to use them, but they're still installed on the system, then LangChain is actually going to automatically import these these experimental features just because my JSON says, Hey, go ahead and import it. And that way the experimental features actually get imported. And then I can just call execute any code on the system because whatever the AI responds with is going to be executed. And that's kind of the bug on LangChain there. It was a pretty cool chain of bugs.

Maybe that's a joke with LangChain, chain of the bug. But I think it all came down to the fact that these frameworks that tried to do a lot of things, they had some features that were a little bit deprecated and whatnot.

Robbe Van Roey (17:56):

And through combining all of those things together, it still led to the possibility of an attack. So I think that's an issue that we have right now where we have frameworks that are doing a lot of stuff, complex stuff as well, requesting data from the internet, loading it back then importing a config file that is going to then import some other data. It's all very complex stuff, complex flows, and right now companies are already struggling with actually making simple flows secure.

So imagine if you start to complicate the process, there's just gonna be hidden bugs everywhere. And I think we've kind of come to that point where the basic security for all of these libraries and modules and frameworks is getting pretty good. But if you look deeper under the surface, there's a lot of bugs that are still hiding there because they're kind of business logic errors, perhaps, that they're not easy to spot. And they will come up a lot in the future.

Madi Vorbrich (18:56):

Yeah. Yeah. I can only imagine. And with that finding too, was that through, was that for fun, or was that for a bug bounty program? Like what, and with the other one too that you found, what kind of drove you to find these? Was there a cash prize at the end, or what made you find these?

Robbe Van Roey (19:15):

Yeah, so initially I started hunting just because I wanted to gather some CVEs. I was interested in that. I also just wanted to make the world a safer space, but then I found that the huntr platform was actually paying bounties for these vulnerabilities, and it was kind of perfectly in line with what I was already doing. I was doing some research on AI frameworks, and then I noticed that huntr was actually paying for these AI vulnerabilities to be reported and to be fixed, and well, those things aligned perfectly. So I was able to get some nice profits out of these vulnerabilities.

That was definitely never the main reason for doing it. And I don't think anybody should be trying to find bugs in open source software just for profits. I think there are better ways to make profit, but if you wanna make the world a safer space and then also earn some nice little casual aside to play around with or have fun with, then it's a great thing that it exists. And it's awesome that also for me, in Europe, of course, the money is maybe not always very, very big, but for people in third world countries, like this could be life changing money as well. So that's something to take into account as well. And even in Europe, it's not badly paid, let's put it at that.

Madi Vorbrich (20:35):

And did those also give you really good credentials even when trying to find a job, stuff like that? Like, was this a nice little add to your resume and the list, the slew of other bugs that you've found?

Robbe Van Roey (20:49):

Yeah, luckily, I had already built out my resume nicely, so these never made a big impact. But one of the things that's, if you want to become an ethical hacker then there's a thousand certifications out there, and all of these certifications are super expensive, and everybody always says you have to get them. And from the beginning, I was really dedicated to be able to get a job in ethical hacking without having any certificates. And there's so many ways that you can do that. And, but finding a CVE, having a CVE in a popular framework like LangChain, like Gradio, like the Triton inference server, having a CVE like that on your resume is awesome. It works very, very well.

Even for the CSO who doesn't know any technical details at a company, he's probably the decision maker on whether you get hired or not, they will probably still have flashbacks from the CVE of Log4J, for example.

Robbe Van Roey (21:50):

So they will definitely know what a CVE is and they will see that you have it, and it will hold a lot of value. It's also proof that you have real world experience. It's a way of getting real world experience without having to first get a job. And I think a lot of people would have a lot, much easier time finding jobs if they could just spot a couple of CVEs. And if you go to my LinkedIn and look at my list of CVEs, you will find some insanely silly ones.

I have one CVE that was in a framework that kind of blew up out of nowhere. The developer didn't really want it to blow up, but all of a sudden it got thousands of downloads every day, and it literally had an endpoint slash execute Python. And this endpoint we take in Python codes and just execute the Python codes. And it was right there all over the internet, thousands of instances just because nobody had ever looked at this. But of course you can report it to make the world a safer space, get a CVE assigned to it and have that to your name. So I would highly recommend anybody wanting to jump into the space to just get a couple of CVEs and build up some real life experience and then go from there.

Madi Vorbrich (23:05):

Yeah. I couldn't agree more. So going back to, with all these discoveries in mind right, that we just discussed, I wanna also pivot to, you know, from what you found and how you found them to then diving into red teaming as well. And I know that you and I have talked offline as to what your perspective is on red teaming, because people in and out of the industry have their own definitions of it. So I'm very curious to hear what your definition is specifically as it relates to red teaming.

Robbe Van Roey (23:39):

Yes. So I think that, I see red teaming right now is a new big buzzword, right? Pen testing, it's not cool enough anymore. Red teaming is a new thing that everybody wants to do.

Madi Vorbrich (23:48):

Yeah, pen testing is out, red teaming is in.

Robbe Van Roey (23:50):

Yeah, exactly. Exactly. But in essence, and this is how I would define red teaming, I may be incorrect. Some people may not agree with me, but for me, red teaming is an offensive engagement where there is an end goal and there is a starting point. And the end goal will be some crown jewels that a company has. For example, all the financial data from the CFO's laptop, that can be the end goal of a red team. And then a starting point will be the compromise of an employee, an employee gets compromised. And from there, we will try to escalate to having access to the financial data of the CFO. And in essence, that is what a red team is. You start from somewhere and you get to an endpoint in any way possible. What is the big difference with a penetration test?

Robbe Van Roey (24:41):

A penetration test, it's going to target one specific system. We wanna check the security of our web application, and so we're gonna focus solely on the web application. We're not gonna take into account that we can social engineer people and get into the server room and just steal the web server that way, or that we can phish somebody, get domain admin and then control the web server, know we're gonna focus on this one system in specific and find as many holes in this one system as possible.

Whereas the red team is gonna zoom out and look at a whole organization perhaps, and try to find, try to see or validate if there is a path between this compromise and this end result and try to figure out if that path exists. And I think that that's kind of the big difference between a red team and a pen test. Yeah, that's how I would define it, I think.

Madi Vorbrich (25:31):

So then what would red teaming a machine learning system entail for you specifically with that definition in mind?

Robbe Van Roey (25:39):

Yeah, with that definition in mind, I would say that if you want to red team an AI or an ML system, you would say, let's say it's a chat bot that has access to the user's orders on a website on a shop. Then you would say, well, the end goal is we have this one user who has made a purchase or who has a credit card on their accounts, can you access what that purchase data was? Can you access the credit card details that are saved off this one user? And then the starting point is you're another user on the website and you're gonna try to get to that end goal. That's kind of how I would see it. And then of course, in the attacks, you would try maybe some jailbreaking attacks on that system try to get access to it, see if there's excessive agency issues where the AI system just has too many permissions, is able to make API calls, for example stuff like that.

Robbe Van Roey (26:33):

And we try to get to that end result. I also perform AI penetration tests and then AI penetration tests. We're just gonna look at the AI and try to find as many holes in it as possible. So we're gonna also kind of check a lot of these paths, but we're not only gonna focus on that one end result, we're just gonna try to identify as many issues as possible and maybe not exploit them as far. For example, we will try to prove that we can read other people's data, but we won't go as far as to see can we find that specific data entry from that specific user that we wanted to find, for example. So, that's kind of where I would draw the distinction.

Madi Vorbrich (27:13):

Gotcha. And then what tools or methods do you use specifically, or what do you find most helpful? And you don't have to, you know, expose like your secret sauce per se here, but what would you recommend?

Robbe Van Roey (27:28):

Yeah, I actually try to use as little tools as possible. For me, my greatest strength is my creative thinking and connecting the dots on the data I see going through an application, going through the APIs. And so I try to have tools that help me identify places where interesting stuff may be happening. But I don't try to have tools that automate too much for me because once I am only looking at an application or a system at a too high level, then I don't get that feeling with the bits and the bytes anymore. And then I start to miss vulnerabilities that may exist there.

So usually what my setup looks like is I have, of course, the browser that's gonna proxy all traffic through to Burp Suite or Caido or a tool like that. That tool is gonna do some automatic recognition on types of vulnerabilities already.

Robbe Van Roey (28:26):

It's gonna do some passive checks and already report stuff to me. And I have a lot of custom plugins written for that so that I get already a lot of information back from it without any human inputs. But that way I get really the details of, Hey, we spotted this pattern that maybe a human is not really good at spotting. But then I look manually, but of course, looking manually only goes so far. You have a limited amount of time. So there's also a lot of, I have a lot of word lists with, for example, jailbreak scenarios and whatnot that I can just run on these systems, run them all through this through the systems. One thing that I also like to do, because AI systems are not deterministic, is I like to set up a way where I can send one prompt and it will automatically distribute it to 10 different instances of the same LLM system that I'm targeting and get all the responses.

Robbe Van Roey (29:21):

And that way I kind of eliminate that non-deterministic nature a little bit because I see 10 different replies coming back, and I can check what the differences between them. And if my attack didn't work in one of them or didn't work in nine of them, then maybe the 10th one may give back a working result. There's actually once been a case where we were playing around with an LLM system and all of a sudden it gave back some really weird data and we couldn't reproduce it for hours. So we actually wrote a bot that was just going to try that same prompt again, 10,000 times, didn't work.

Madi Vorbrich (30:02):

Oh my gosh.

Robbe Van Roey (30:02):

And then we left and run like an entire night. And one of the replies then came back in that same way. And that way we could actually hunt down what the bug was there, just because it was such an edge case that that only popped up once in so many thousands of requests. And now having whenever I have a payload that I think might work, I just run it 10,000 times against the system. Of course, if the company allows, because that can sometimes also introduce some costs, and yeah, for some reason sometimes and 9,000 times it actually works, whereas the other 10,000 attempts didn't work.

Madi Vorbrich (30:38):

Yeah, that's actually a really interesting approach. For me, I haven't heard that before and it actually makes complete sense to approach it that way. So if we wanted to look at the scope of red team operations, when you specifically, when you plan a red team exercise in an AI driven application, what's in scope for you?

I know you kind of gave us a taste already, but which areas do you focus on when you're attacking, you know, like a machine learning pipeline anywhere in the pipeline? Like what do you focus on directly?

Robbe Van Roey (31:18):

Yeah, so we always sit down with the client and we talk about what are the risks, the scenarios that, that you see the most feasible. Is this an external entity, just anybody from the internet. Is this somebody a user in your system? Is this a company you're working with? Is this a third party provider that has access to somewhere in the pipeline? And then from there we can, you know, mount the attacks based on that.

If the risk is we're scared that we'll get a malicious model imported in our system, well then we'll have to build malicious models and approach it that way. But right now, most of the engagements are really focused on that end user perspective because these systems are maybe public or public to the companies that are clients of them.

Robbe Van Roey (32:10):

And usually it's really that end-to-end testing. The input comes in the system and the input exits the system. But sometimes it's indeed also the literal model that could be malicious or somebody who has accessed somewhere in the pipeline, what can they see from the traffic that goes through the system? Let's say we have a third party that has access to our systems in some kind of way because they control some component in it's what can they see, what that type of data could they exfiltrate from the system? So that's, there's a lot of different areas to look at, but right now the main things that we test are really those end-to-end cases.

Madi Vorbrich (32:51):

Gotcha. And then for organizations that wanna start red teaming their AI systems, for listeners that follow you, that are inspired by your work, what advice would you give them?

Robbe Van Roey (33:05):

So if you want to start attacking AI systems?

Madi Vorbrich (33:09):

Yeah. Or even for organizations that want to start testing their systems.

Robbe Van Roey (33:12):

I would say first of all, have somebody internally play around with the systems and have them do some preliminary testing 'cause a lot of times you will find that just natural people without any expertise in the topic are actually quite good at getting an AI to do things that it shouldn't do. I heard from a colleague who gave his mom one of those challenges where you have to get the password out of the AI system. He sent her the link, and two days later she got back with a message, Hey, look, I'm in the top hundred on the leaderboard right here. And it's just my colleague's mom who has no IT knowledge, no knowledge whatsoever. So if you're starting from zero, you have no security built into your product, start with that, start with it internally.

Robbe Van Roey (34:11):

Also start kind of integrating security as a stakeholder in your organization. So at any decision you make related to the system, always have a stakeholder, AI stakeholder security. So you would have a stakeholder, the user wants this to happen, the QA team wants, or the sales team wants this to be a new feature. Well, from now on, security will also want things in your decision process when you're designing new features and new stuff. And then you will find that you can already mitigate a lot of these issues before they ever get to the point where they need to be tested.

But at some point, of course, you're gonna wanna run a kind of external engagement on your systems. And right now, there's a lot of partners out there that are, I think, very happy to perform these tests.

Robbe Van Roey (35:02):

'Cause they are just so fun to do. From my perspective, as a hacker, I always light up whenever the sales guy comes over to me and says, Hey, we got a new AI pen test that you're gonna be able to do. That's always lovely because they're such unique systems. They all perform very interesting applications. They all have their own interactions that they can make. So some of these AIs have very weird agency where they are able to create tickets or they're able to fetch data from here. And then the attacks that you find are so creative and are so different than the ones you usually find in application penetration testing. So it's a lot of fun to do.

Madi Vorbrich (35:48):

Yeah. And I know that sometimes it can be challenging, I guess it depends on the person, but kind of to what you've nodded to a bit earlier, like sometimes if you're new to the space, it can be really intimidating. It can be kind of like, oh my gosh, there's just a lot happening all the time. Some people get it right away and some people don't.

But even if we want to look at the more beginner friendly side too, what would you recommend for people like just kind of dipping their toes and starting off in the field as well?

Robbe Van Roey (36:17):

Yeah, I would read a lot of the public bug reports that you can find. I know that on huntr, a lot of this stuff is public. So you can already get a great idea of what other people are finding. And that way you can maybe also start to spot some patterns of vulnerabilities that happen often in the space.

Besides that, there's also a ton of fun games out there where you have to do prompt engineering. Those are also a great way to kind of dip your toes in and play around with stuff. I wouldn't start with deciding that you wanna now go and jailbreak ChatGPT and it's probably not the right place to start. But who knows? Perhaps maybe that is the right place to start and somebody who's watching this will go ahead and prove me wrong.

Robbe Van Roey (37:07):

I'd love to see that, that'd be awesome. But like with learning anything in offensive security, it's all about building those mappings in your brain that when you see something, you immediately draw these connections between what you're seeing and what's happening in the backend. And it's, and you can only get this, it's kind of muscle memory for your brain. You can only get that by looking at a lot of systems and doing it a lot and being passionate by it. And always having that little devil on your shoulder that's saying, what is this behavior? How is it happening? What's the application doing?

And then you just keep on digging deeper and it's 2:00 AM and you wanna go to sleep. But you're so close to figuring it out. And if you have those kind of moments and have fun with it and enjoy it and just keep doing it for a full year, then you're gonna be a great hacker, in my opinion. I think that anybody who devotes like a year to it will become a great hacker.

Madi Vorbrich (38:06):

Yeah. Yeah. I completely agree. Yeah. You definitely have to have that mindset of like, man, I really like breaking shit but I also wanna learn how to build it too. It's yeah, a lot of hackers that I know definitely have that kind of a mindset, which is great 'cause they're great in their fields.

Okay, so before we start to close this out, I just wanna quickly dive into the defensive side of things and say, you know, like let's say that I'm an ML or I'm a developer, right? And I'm creating an ML pipeline, but I wanna secure it day one right out the gate, make sure that it's safe. One lesson that I kind of gathered from your BentoML discovery was that you need to ensure they're using safe model formats, right? So what should teams do to safely store and deploy models, and how do you ensure that your pipeline is safe out the gate?

Robbe Van Roey (39:06):

Yeah, that's a great question. I definitely haven't, I do not have a lot of experience, but looking at it from that side. What I mainly think is important is that you should see your AI system as something that's fully open. And with that, I mean that your system should still be secure if you fully remove the AI from it. And every action that it can perform is public. So any data the AI has access to is public. Any API endpoint that the AI can call should be considered public. Like you should be just able to plug out the AI system and it should still hold all its security.

And I think that right now there's a lot of efforts on integrating LLM systems where people make assumptions that nobody is going to get through the AI, but my experience has shown that right now, it's just not a good assumption to make because eventually somebody is going to find that prompt that is gonna bypass all of the security and then and leak the system prompt and is gonna be able to enter a debug mode where you can perform any action that the AI can.

Robbe Van Roey (40:21):

And so if your security doesn't hold at that point, they're already starting out of the gates in kind of a faulty state.

Madi Vorbrich (40:29):

Right. Awesome. Well, you dropped a lot of really good gems in this episode. Robbe, this was such a great conversation. Thank you for sharing all of your insights.

Robbe Van Roey (40:42):

Thank you for having me. It's been fun.

Madi Vorbrich (40:43):

Yeah! And before we sign off, I know that you've dropped so many tidbits on like where to start, what you should be looking out for. But did you have any final last words of wisdom that our audience can take away?

Robbe Van Roey (40:56):

Final last words of wisdom. You are insecure, you will always be insecure but you can try to make as much improvements as possible so that you're only vulnerable to a text that nobody in their right mind would ever figure out.

Madi Vorbrich (41:16):

Awesome. Well, again, thank you so much, Robbe. This has been such a pleasure. And thank you everyone for tuning in. That's all for this episode of the MLSecOps Podcast. So stay tuned for our next discussion and we'll see you soon. Thanks.

Robbe Van Roey (41:31):

Thank you. Bye.

[Closing]

Additional tools and resources to check out:

Protect AI Guardian: Zero Trust for ML Model

Recon: Automated Red Teaming for GenAI

Protect AI’s ML Security-Focused Open Source Tools

LLM Guard Open Source Security Toolkit for LLM Interactions

Huntr - The World's First AI/Machine Learning Bug Bounty Platform

Thanks for checking out the MLSecOps Podcast! Get involved with the MLSecOps Community and find more resources at https://community.mlsecops.com.

Guest

SUBSCRIBE TO THE MLSECOPS PODCAST