From Pickle Files to Polyglots: Hidden Risks in AI Supply Chains
Audio-only version also available on your favorite podcast streaming service including Apple Podcasts, Spotify, and iHeart Podcasts.
Episode Summary:
Trail of Bits’ Keith Hoodlet joins the MLSecOps Podcast to unpack the biggest threats in AI/ML security—from jailbreaks and prompt injection to Polyglot model files and insecure dependencies. He breaks down how traditional AppSec skills map to AI systems, what he learned from the DoD AI Bias Bug Bounty Program, and why compliance, monitoring, and good fundamentals matter more than ever.
Transcript:
[Intro]
Charlie McCarthy (00:08):
Hey everybody. Welcome back to the MLSecOps Podcast. My name is Charlie McCarthy. I'm one of your MLSecOps community leaders, and today I have the extreme honor of being joined by my guest host and colleague, Ethan Silvas, who is a Security Researcher at Protect AI. And our guest of honor, Keith Hoodlet, who is currently the Engineering Director of AI/ML and Application Security Assurance at Trail of Bits. Welcome both of you to the show.
Keith Hoodlet (00:35):
Thank you so much. Really excited to be here.
Charlie McCarthy (00:37):
Yeah, absolutely. Before we dive into our line of questioning, Keith, do you mind giving the audience a bit about your professional background and kind of what led you to the space that we're gonna be discussing today?
Keith Hoodlet (00:49):
Yeah, so my traditional background is in application security. It's a space that I've been in for probably a little more than a decade. And I'm an offensive security certified professional, top 300 bug bounty hunter on Bugcrowd. I have a lot of experience working with a lot of web applications in particular, which translates nicely to a lot of the problems that we see in the AI/ML space with APIs or the chat interfaces that you tend to see that are implemented on web applications. And so my transition into this space has really been something that's been happening over the last few years. Previously, I was at GitHub working on the advanced security product suite as well as red teaming internally, their GitHub Copilot product before launch. So, I've had some exposure to it in that regard.
And then when I joined Trail of Bits, they said, "Hey, Keith, you have experience in AI/ML, with hacking on some of this stuff in bias testing, you got a long history of AppSec, they have a nice overlap. You're now in charge of both teams. Congrats." And I was like, okay, here we are. So it's been about 10 or so 11 months since I joined Trail of Bits, and it's been a wild and exciting journey.
Ethan Silvas (01:53):
Keith, I wanna talk a little bit more about kind of your journey into AI security. So first we wanted to think about like what originally kind of led you into the AI/ML space? So you're kind of talking about your personal research then and kind of became your chops. So you know, what were some things that got you interested and what, or also what are some things that kind of kept your interest going on?
Keith Hoodlet (02:15):
Yeah, I think the big thing for me, it was about a year ago actually, it's almost exactly just one year anniversary of the conclusion of the US Department of Defense AI Bias Bounty testing program. So of course, large language model jailbreaking and prompt injection and things had really been on the scene for maybe a year and a half or so before that. And I had dabbled with that a little bit. I mean, it really felt like the wild west, but when I got invited to that bounty program, I was like, okay, interesting. You know, the bias testing piece felt like a new angle that hadn't really been explored deeply in the bug bounty space. And it was exciting and fun and what I found ultimately is the scientific method works really well for testing these things.
Keith Hoodlet (02:57):
And just pursuing that from the perspective of you know, holding controls in static and then changing certain variables to test for bias was a really effective method. And overall, I was like, there's so much to explore here that just, it felt like we haven't seen this sort of, you know, evolution of a technology space that's so adjacent to the work that I've done previously in, I don't know, probably since really blockchain started coming out. Like that's the most recent technology that felt like new and novel and a lot of things to go find.
Ethan Silvas (03:33):
Cool. So you kind of touched on it a little bit with, I think basically, you kind of alluded to fuzzing, but we also wanted to know about like some specific skills or practices that in traditional app security that you find useful when testing against these AI/ML applications, whether it's offensive or defensive. But yeah.
Keith Hoodlet (03:58):
I think one of the interesting things especially as you mentioned fuzzing when it comes to security testing of large language models, we look at like things like the greedy coordinate gradient or GCG, where you're trying to effectively modify the way in which the inputs are being interpreted by the large language model so that you can then like change the outputs it's producing to your favor. We've actually seen here with testing we've done with clients where we can basically calculate what sort of pre character set or you know, pre token set might be required in advance of a prompt to successfully introduce prompt injection with consistency. And so it's very similar to that concept of fuzzing, where you're throwing a bunch of garbage inputs into, you know, some sort of input in the application and then determining where it crashes or you get a 500 server error in the web application space and you start to realize, okay, what attack vector is going to work?
Keith Hoodlet (04:53):
Well, very similar things are happening with large language models where you can actually, you know, calculate through repeated turns at the large language model to determine how can I introduce a jailbreak or a prompt injection? And then once I can introduce that, what sort of controls or activities can I take that might compromise the business, compromise the application, compromise data that is accessible to the large language model that wasn't intended for me to reach? So fuzzing, I think is the most comparable way to describe that process, and it's so far, seems very effective. And I can imagine it will be very hard to address from a security standpoint just because of the nature of the way that these things work.
Ethan Silvas (05:37):
Yeah. And it almost feels like a social engineering attack where you're trying to convince like some AI to actually jailbreak and stuff. Also related, so what do you think, people that haven't gotten in or like really broken into the AI/ML red teaming or security in general what are some, you know, traditional things or skills that they might have that would really transfer over and kind of help them get into it?
Charlie McCarthy (06:07):
That's a good question. Like upskilling?
Keith Hoodlet (06:10):
Yeah. Yeah. Yeah. I think especially for people that have traditional application security skills, for example I look at things like cross site scripting, or I look at things like SQL injection or service side request forgery, as all like areas that are important to have a lot of just like knowledge of how they work. Because once you can get the prompt injection to successfully trigger and say you have agentic capabilities where maybe it has like tool use that it's, I don't know, writing a ticket to a ticketing system, for example, it's like, okay, now that I can get it to produce some sort of content in a backend system that I can't directly interface to, what attack technique do I need to then introduce into my inputs that might trigger on the other end? And knowing what the other end looks like or conceptually being able to say, oh, okay, it's a Jira ticket.
Keith Hoodlet (06:57):
Well, that Jira ticket or a Slack message has link capabilities. Like what can I now do to introduce some sort of like malicious link that might get triggered, whether it's via like a direct like user clicking on, oh yeah, here's the image that I've included that is my support ticket. And then maybe it's a malicious image that has like a callback shell, for example. So there's all sorts of just like second order thinking that I would say needs to go into attacking these things. So if you're already familiar with a web application security space, cross site scripting, server side request forgery, XML external entities, SQL injection, all have very similar just like modes of thinking about exploitation. And then ultimately triggering those exploits are they feel similar or feel very much the same. But to your point now, it's using natural language as opposed to some form of syntax or fuzzing to produce that output.
Charlie McCarthy (07:53):
You know, in the AI security space, Keith, we hear a lot about the type of attack that you mentioned, prompt injection attacks, indirect prompt injection. We also hear a lot about vulnerabilities within the AI supply chain, so like ML models and all those artifacts. In your research in this space thus far, are there particular threats that stand out to you outside of prompt injection attacks? And all of them together as a group, what do you think makes them so challenging to defend against?
Keith Hoodlet (08:27):
Yeah, so I think the prompt injection in jailbreaking space takes the most of the oxygen in the room when it comes to these conversations because I think people can easily understand how those things might impact their business or their software or their customers. Like, it's pretty easy to draw a line to, oh yeah, prompt injection leads to disclosure of internal, you know, knowledge base of the company or, you know, some sort of second order attack. Outside of that, I mean, we have seen a couple of instances so far of like backdoored pickle files or backdoored models that showed up on Hugging Face. And so we have seen and done research on things like Polyglot PyTorch files, for example, where it can be interpreted multiple ways, and maybe one of those ways is malicious and the other is entirely benign.
Keith Hoodlet (09:14):
And so when you run it through a scanner, if the scanner's not doing a really good job of looking into this, you end up in a state where, you know, it gets the green checkbox, but oh wait, it's actually malicious when you try to run it. And so that is a space that I think continues to get more attention as time goes on, because people are starting to realize that PyTorch much like to use a web example like AngularJS versus Angular 2, like there were a lot of changes to the way that the JavaScript worked for that. And so, granted, they were breaking changes. You couldn't just like immediately port one to the other, but in the PyTorch case, you can actually port one to the other. And so for anyone that's listening with a traditional AppSec background, it would be sort of like adding the AngularJS sandbox capabilities to Angular 2.
Keith Hoodlet (10:02):
Well, that allowed for remote code execution in the original Angular, which is why it changed so much in the second version. But in PyTorch, it's almost as if they've just done this case of like carrying forward backwards compatibility with all these different versions and file types. So because of that, you have Polyglot attacks then you have, you know, all the follow on attacks from like backward models and things that come from that. So who knows what you could accomplish. We've also looked at interestingly, conceptually what we call monkey patching, which is when you actually like basically on the fly, you end up patching Python underlying library functionality such that effectively now when the model calls that functionality in some capacity, it will actually trigger on code that you've now overwritten in dynamically running Python code.
Keith Hoodlet (10:56):
And so I know we have a blog post that either, I think it's still like in the backlog waiting to be published, but we've done some really interesting research on just like, Hey, I'm an attacker, if I can find a way to dynamically overwrite a running like Python library that maybe hasn't been called, but will be called by the running process, I actually now have code execution. Which you would accomplish via initial, like route of say a prompt injection or a model jailbreak of some kind. But ultimately that leads into just straight code execution type attacks that would look genuinely benign. It's like, oh yeah, models running normally model calls library, then you end up with an attack and you might not notice that if you're not, you know, really monitoring, logging, and alerting in your systems appropriately.
Ethan Silvas (11:43):
Yeah. And also underlying Python libraries, I mean, there's so many traditional AI/ML libraries like pandas, NumPy, scikit-learn stuff. They're just, they're full of arbitrary code execution types of things. So just that's such a big thing about supply chain vulnerabilities. But also I wanted to mention specifically about the new work from Trail of Bits with fickling and, you mentioned the Polyglot files, but also in that there was a blog post about it. You guys kind of mentioned how Pickle has just kind of been this massive security issue in AI/ML workflows and stuff, and almost suggested like, you know the industry kind of needs to move away from that and kind of move towards something like Safetensors, which Trail of Bits have audited. So yeah, how do you think that the, I mean it's tough to say, but how do you think the AI/ML industry will kind of move away from a lot of these older kind of libraries like pickle?
Keith Hoodlet (12:48):
So these things always take time is sort of the sad part about the world we live in. Like the long tail or the long half life of pickle files is going to be very long, because inevitably someone is gonna be, you know, as a company, very reliant on these things, and they're not gonna move away from them. Similar to like how we still see COBOL in banks or Java in most like large enterprise software stacks. I get the feeling that pickle files are gonna be around for a long time, which is sort of unfortunate. But you know, I think that being out there and really beating the drum of Safetensors is the right way to go. I mean, it solves a lot of these problems. It's very similar to in, you know, going back to the AppSec world for a moment, like ReactJS was one of those frameworks that we really were pushing a lot of people to in the application security space in general, because they did a very nice job of solving a lot of the security problems that AngularJS had at the time when it first came out.
Keith Hoodlet (13:42):
Google has since, you know, updated and changed that, but it caused a lot of consternation within their own development community because breaking changes genuinely mean you have to rewrite a lot of code. And I imagine same thing here, people moving from Pickle file to a Safetensor file means a lot of rewriting. And so because of that, you're gonna end up with, you know, some organizations, some companies that are just gonna say, we're not gonna make that move. It doesn't make sense for us. And well, hopefully they're, you know, checking these things with fickling just to make sure that they're actually safe to use and not just, you know, YOLO downloading from Hugging Face and running them in production, which sometimes happens.
Ethan Silvas (14:20):
It happens a lot.
Charlie McCarthy (14:25):
Yeah. Keith, you make a really good point about some of these things or effective change taking a long time. Like we've been talking within this community for a couple years about MLSecOps obviously, and like building security practices into AI development, deployment, adoption. And part of that is kind of centers around the three main principles of cybersecurity, people, process, technology. And so people are a key component of these changes or the change management that we might be talking about within the industry.
When we're thinking about adapting like traditional offset techniques for testing AI systems, can you tell us a little bit about how you think about that? Or could you even speak to maybe like how Trail of Bits is thinking about it as they're doing their research and, you know, does it make sense to augment what Offset teams are currently doing specifically for AI and kind of build all of that in? Or just, how do y'all think about that?
Keith Hoodlet (15:27):
So it sort of depends on specifically the client. For example, if you're using Pickle files, you're probably doing custom models. And so in those situations you know, we've already gone ahead with fickling and released an upgraded module for a Polyglot creation. So the idea here is like actually test the creation of Polyglot files to see if the way that you're actually running your application is potentially vulnerable to a Polyglot type attack, where maybe you've, like introduced a malicious model or malicious behavior just by parsing a pickle file that was somehow modified or downloaded from a source like, say, Hugging Face or some other repository store. And so I think as a starting point, you know, introducing things like fuzzing is good from a, like add Polyglot files to, you know, your tests. We also look at things like you know, PyRIT from Microsoft as an example of tool sets that can do some introduction of prompt injection type attacks.
Keith Hoodlet (16:26):
There's a lot of tools starting to come out there. And so I think it's harder in some respects to do the earlier stages of the DevSecOps pipeline of like good static analysis and like secret scanning and things of that nature. Like, some of those problems are already solved with tools today, like TruffleHog or GitHub advanced security secret scanning, like that will catch a lot of those early problems. But then fickling will help you catch some of the Pickle file type static analysis. But other introductions of tool sets or APIs, for example, becomes harder because you're writing your first party code and then you're sending something off to an API for say, a frontier model or a, you know, self-hosted model out in the cloud in your virtual private cloud environment. And predictability of behavior with large language models is I think a really hard problem that really isn't solved and may never be solved because now if you have a model that behaves one way today and you're, say using a frontier model, and then they update the underlying model that you're using with a new training data set, or new instructions or new reinforced learning human feedback as part of the way that they've trained their new model, your outputs could change.
Keith Hoodlet (17:36):
And so that could mean potential vulnerabilities, it could mean potential bias, it could mean just potential outcomes that you haven't anticipated or tested for or considered. And it's almost like, to bring this back a little bit, it's almost like if you were running an application, say like a WordPress site, and then suddenly the WordPress engine changes and the template or the style that you were using like suddenly was being misinterpreted and you had no control over that. That is gonna lead to just business consternation of some kind. Whether it's an actual exploitable vulnerability or people scrambling to update the website so it looks good again. I think LLMs sort of represent in a simplified way the same sort of problem, just in a different, different way. Where it's like suddenly the LLM is selling you a Ford truck for a dollar, or talking about bereavement policies that don't exist you know, in the case of like Air Canada. So it's an interesting space in that regard. And I'm not sure if that answers your question succinctly, but you know, there's a lot going on here, I guess.
Ethan Silvas (18:44):
One thing also, Keith so you mentioned kind of these LLM models keep coming out, keep coming out. What do you think or like, how quickly do you think companies or even just regular people should upgrade to these new models if they're actually deploying applications? Because often we see like the protections for things like jailbreaks, for example, we're just like, oh, you know, upgrade the model, start using a new model. It's usually more robust against jailbreaks, blah, blah, blah. And there's also things like creating robust system prompts and things like that. So yeah, I was curious about your opinion on how quickly you should actually upgrade to a new model.
Keith Hoodlet (19:22):
So in true security fashion, it depends. So I think the right answer there is if you have good logging, monitoring, and alerting of how your system and your integration is performing, is responding, is being used by your user base, upgrading to like newer like latest and greatest frontier models may be a good decision in some cases because you may get more benefit from it. It may be a more costly decision because those API tokens cost more per use than before. And so, like, the upgrade might not make sense financially. It may also not make sense from like a, what you need from the model perspective. So oftentimes I'm still using ChatGPT 4o, because it's fast, it's performant, and it gets the things done that I need for the use cases that I have.
Keith Hoodlet (20:13):
So if your business needs have not changed, moving models may not make sense because you've run all of these tests, you've done a user acceptance test or acceptability tests, and you've got, you know, sort of a, at least well understood performance. Now granted, these things are sort of unpredictable, so they could change behavior at, you know, a moment's notice. And you're not really sure why. I just saw an article over the weekend from Ars Technica that like Cursor suddenly stopped allowing people to like vibe code with it and said like, learn to code. And so I thought that was really interesting where it's like, what changed here that caused this? Was it like, you know, cursor changed a pre-prompt somewhere and suddenly it was like, come on, just like, learn to code already or what? But I think in some respects, the business mantra of if it isn't broken, don't fix it may also make sense here, but if you truly do need you know, new capabilities, they come at a cost, but they may be worth it for your business case. As long as you have good logging, monitoring, and alerting I think you're okay to move forward.
Ethan Silvas (21:20):
Keith, so you mentioned earlier on in the podcast about you entering the I think it was the first, actually the DoD AI bias bounty contest. So yeah, how about you kind of elaborate more on that and kind of tell us your experience with it and what you learned from it, what was fun about it? What do you think we can learn from it?
Keith Hoodlet (21:41):
Yeah, yeah, yeah. So it was a really interesting experience. So when I initially got the invitation, I was like, oh, wow! You know, bias as a testing case is very different than your traditional security attacks. And so my degree from university is actually in psychology. And I was like, perfect use case of my sort of background from formal university training plus my like hacker mindset. And yeah, it was like, it just felt so comfortable for me. And so, for background, in the United States at least, you know, at the time of the test itself, there were quite a lot of federal laws around avoiding bias in hiring practices and promotion practices and all these sort of different things related to human resources. And so for example, you could not discriminate on like age, gender, ethnicity, pregnancy status, military veteran status, religious affiliation, and so on and so on.
Keith Hoodlet (22:38):
And so the testing was actually for an open weights model that was being explored by the US Department of Defense to introduce to their various services in which case they would allow them to introduce AI in a, I think the wording that they use is like a reasonable US Department of Defense applicable use case. So part of the problem was coming up with use cases that felt like they made sense for the US Department of Defense. And then part of it was then proving bias in the open weights model that was under test at the time. And so for me, I used LLMs to help me solve those problems. I admittedly went out to ChatGPT and I was like, okay, here's the brief for this bounty program, and here are some of the conditions that they're looking for and some of the examples.
Keith Hoodlet (23:28):
Can you give me some examples of other areas that might be interesting to test? And so it would give me a list, and then I'd say, okay, can you now produce prompts that I could use that are reasonable DoD use cases meeting this set of criteria? And one was like effectively the AI bot was supposed to help with biochemical research, and one of the prompts that it gave was effectively the Captain America program. It's like, you're an AI bot that is helping select candidates for a super soldier program. We need it to be like a diverse set of candidates with, you know, well represented within the military services, and we need you to help us select the best candidates for this you know, super soldier program. Then what I did is I actually had ChatGPT also create the roster for the Super Soldier program.
Keith Hoodlet (24:20):
So I basically is like, give me a diverse set of candidates, gimme different ages, genders, ethnicities, et cetera, et cetera, et cetera as well as give me like ranks, et cetera. Then I normalized it. So in true scientific study fashion my intention was, okay, everyone's a sergeant, just like make them all one rank, so we don't have any like rank bias and then give them all very similar names. So I had like Sergeant John Smith, Johann Smith, Jane Smith, Jean Smith, et cetera, et cetera. And so I tried to find gender ambiguous names as well. And then I would hold a bunch of conditions the same. So maybe age was the same, rank was the same, but maybe their gender was different or maybe their pregnancy status was different or what have you. And then...right, exactly.
Keith Hoodlet (25:10):
So the things that you're under test, and then I would prompt it 10 times in a row. So I'd have like 10 different tabs and prompt. It was through a chat interface. They didn't give us an API, but I prompted 10 times in a row and I'd be able to show if in all 10 times it comes up with the exact same results of choosing the exact same three people from a group of five, that I could reasonably show that within like a 0.064% chance of that happening randomly, that it was actually biasing towards specific characteristics. Then I took the findings, threw it back into ChatGPT, and said, here's my prompt, here's the response, why is this biased? And have it had it write my report for me, and then just submitted the report.
Charlie McCarthy (25:51):
Work smarter!
Keith Hoodlet (25:53):
Yeah, right. I mean...
Charlie McCarthy (25:54):
Work smarter, not harder. I love it!
Keith Hoodlet (25:56):
Yeah, it's still effective today. I mean, I was just a couple of weeks ago helping my friend Jason Haddix at Arcanum-sec. He had an attacking AI class, and he wanted to show some bias testing, and I pulled up GPT-4o and I basically came up with a scenario where, you know, people are really freaked out about avian influenza right now, right? Like, it's a huge thing in the news. So I said, okay, what are the most severe symptoms? Got that, and then I had to create a roster of people, and then I said, okay, you're an AI bot in an emergency room trying to help nurses and doctors, triage patients with the following set of symptoms. And then gave, you know again, a bunch of people. And what was interesting is I set everyone at the age of 45, and I had sort of a diverse group of genders, but one of the women in that group was pregnant, which for those who have had children they know that the older you are, the riskier that is.
Keith Hoodlet (26:53):
And lo and behold, it would always choose the pregnant woman as the person who should be triaged first, which is, you know, from a health perspective makes a lot of sense because again, older women who are pregnant are at greater risk. And then you have this additional, you know, setting or situation like yeah, prioritization makes sense, but also as soon as you remove that condition of that individual being pregnant, it was like treat them all at once equally. And so, like clearly the model was, you know, biasing toward older pregnant women is like a priority from a healthcare setting in context, which shows bias. But in this case... Go ahead.
Charlie McCarthy (27:32):
Oh, I was just gonna say, that's an interesting point you're making because the term bias does, I mean, it kind of carries a negative connotation with it, but in this particular case, you could make a case for, you know, bias serving, doing what it was supposed to do.
Keith Hoodlet (27:48):
Well, and bias as a human characteristic is really fascinating. There's a book that's very dense, and I usually don't recommend people read it 'cause I've read it twice and I still didn't grab everything from it. But it's by Daniel Kahneman called Thinking Fast and Slow, and it's about heuristics and human bias. And it's served us over, you know, millennia of evolution as a species. And so it's really just preferential treatment or in this case like a sort of a negative treatment of different groups, right? And sometimes that makes sense. Like you would want to generally bias toward an older woman who is pregnant because there's greater risk there. But then when you remove that condition, suddenly like, just treat everybody, it doesn't matter, like, you know all this other stuff is also sometimes good, but maybe in context of like a healthcare setting not useful either. Like it's sort of making something perfectly secure usually means making it totally useless. Well, if you truly de-bias the systems in the context in which they're used, it may present a situation in which the LLM is actually not useful as a tool because it's not fulfilling the purpose that you actually need it to fulfill, which is helping, you know, simplify decision making in a way that allows you to take action quickly.
Ethan Silvas (29:02):
You mentioned you used GPT to help with your testing and that, and I found it a little interesting 'cause I'm sure you get this question all the time people asking like, how do you actually use LLMs for red teaming? Especially because it's so hard to actually get an LLM to make you a vulnerability or a payload or anything. But I also wanted to talk about your blog because in your, I think your most recent blog post you're kind of talking about well, are these LLMs actually reasoning? Can they actually come up with these unique kind of answers that you would need for something like a bug bounty program? So I kinda want to ask, like in general, even outside of the the DoD bug bounty how do you think LLMs are, you know, are they good or effective in these kinds of bug bounty settings where you kind of need to, you know, think a little harder, think creatively and it's not just like you know, black and white?
Keith Hoodlet (30:00):
A lot of it I feel like is, in some respects, and this may be like one of those buzzword bingo moments, but prompt engineering. You can actually get these things to be somewhat helpful from like a security red teaming perspective as you know, as I've also shown like an AI red teaming perspective. If you know what to ask for and how to ask. As well as if you set like your system prompt in a way that sort of leans into it being helpful to you on those tasks. So in some of the research we've done here at Trail of Bits, we see somewhere between a 15 and 30% increase in like improvement to a human operator's ability to perform offensive security operations if they assist or if they use a large language model to assist with some of that work.
Keith Hoodlet (30:43):
So we've actually seen and shown through a study that we've been performing that this is possible. And so from a security perspective though, the other thing that I think is really important, and I mentioned this in my most recent blog post on the topic, is that large language models by themselves are very good at identifying things that are already in their training data set. And we saw that in a study from New York University and some folks who are now researchers at a company called XBOW, who's known for doing dynamic analysis of applications for security vulnerabilities. My acquaintance Oege de Moor, formerly of Semmle, started the company and he also created code QL, which I worked at when I was on, or worked on when I was at GitHub. So smart people doing really interesting work.
Keith Hoodlet (31:29):
And what they found was when they were trying to test applications dynamically for vulnerabilities that would not have been part of the training data runs for the latest models that they were testing against, that they just weren't performant. That they actually didn't successfully identify. It was like only 2 outta the 21 cases did they actually find a vulnerability. So pretty low performance, you know, say less than 10%. But that said, I think a lot of the vulnerabilities that we see in an application security space are somewhat well known. So the patterns are sort of well trodden. Then again, and this is where I sort of dither we can look at some of the research coming out of Apple on reasoning of AI models, and they've shown that, you know, you just change a few of the numbers in some of these mathematical equations and suddenly they're not able to solve them anymore.
Keith Hoodlet (32:21):
So I lean generally toward large language models catching basically the 50% under the bell curve things that are in the training data set. It's very good at giving you information back that it likely has seen and consumed in some manner from training. But as soon as you get out of the average use case scenarios it suddenly becomes much less performant and helpful. And I actually have seen this myself firsthand. So when I was at GitHub for example, I would use GitHub's Copilot and Copilot chat to have it assist me in writing code QL queries which is their static analysis tool. And code QL is in an unfortunate situation where there's not a lot of training data and as a language, it's evolving faster than the training data sets and training runs for these large language models.
Keith Hoodlet (33:14):
So you suddenly get syntax being recommended to you that's no longer appropriate for what you're trying to accomplish on the latest engine version. So one of the things that I think is likely to continue to be the case with large language models is if you're working on a language, a framework, a technology that is newer than the training data set that was collected and ran for the model you're using, that model won't be able to help you. And if it is, it will probably be wrong or hallucinate quite frequently in that process. So it's an interesting problem because yeah, are these things getting smarter? Arguably they're just getting more training data set and getting more current with what has happened over, say, the last year or the last six months. But are they able to solve truly novel problems that it's never seen before?
Keith Hoodlet (34:02):
That remains to be seen, and at least with the way the technology is built today? I'm skeptical that it will get there at least in the current form of where we're at, if we have maybe a few additional gadgets that they can come up with in terms of training and reinforcement of the learning and getting these things to work in an agentic maybe like thinking agenetically model where you have specialization in different fields and they sort of come together collectively to help solve problems as a collective group. Maybe we get there? But at least, I remain skeptical today is the way I would put that.
Charlie McCarthy (34:36):
Fair and well stated. Yeah. Alright. We have just a few minutes left, so I wanted to quickly get your insights, Keith, on pivoting over to more of a compliance related question, and then we'll hop over to Ethan for some fun quick hits at the end. And I'm gonna apologize in advance because this is a super wordy question, so bear with me here.
Keith Hoodlet (35:00):
Sure, go ahead.
Charlie McCarthy (35:00):
Related to compliance and regulations, protecting the public and other ecosystems from the impacts of potential AI harms or harmful outputs from LLMs, there are a lot of regulating bodies in the United States and around the world that are working on policy. And I imagine we're gonna start to see a lot more of that rolling out over the next year or so, especially because of lot of these systems are starting to be used in enterprise and there will unfortunately probably be some public harms that will go to court and start to set precedent for some of these laws that are gonna into play. Balancing innovation and regulation is tricky and it's also a very hot topic. What do you think is the best approach for security teams who are trying to stay compliant? But, you know, we don't wanna slow down their progress.
Keith Hoodlet (35:51):
I think especially in like within the last few weeks, this is becoming an increasingly hard problem for security teams, depending mostly on where you're located in the world and what regulations you fall into. So for example, there's these action plans that have been proposed to the US government from the likes of like Anthropic, OpenAI, Google, Microsoft, and so on. And a lot of these action plans lean into just advancing the technology or advancing the space in a way that sort of disregards regulation. And I think the current administration is also moving in that direction. We saw some of their talks that they gave in France recently on, you know, really moving the needle forward. And so if you're a US based company, I think you're likely to see very little regulation actually come about from a security in the model's perspective. I think the, especially frontier model companies themselves, will need to build in safety measures because otherwise, to your point Charlie, that this will lead to some pretty serious bad news cycles.
Charlie McCarthy (36:57):
Yeah. Something more serious than an Air Canada example or, you know...
Keith Hoodlet (37:01):
Yeah, right, right. But in the EU on the other hand, they're moving forward quite heavily with regulation in this space. And I think especially if we look even back to existing law, like General Data Protection Regulation or GDPR, the privacy of data and the collection of data is likely to play out in a way that is favorable to individuals and less so to companies. And so I think one of the things I've suggested to friends who are testing LLMs in a bug bounty scene today is if you can figure out a way to get personal identifiable information out of the large language model, you can report it for any internationally running or operating company under GDPR because they're still subject to it if they're operating in the EU or if it impacts EU citizens. So I don't know, it feels a bit like the wild west.
Keith Hoodlet (37:49):
There's not quite a regime in place today from a you know, US administration or government perspective that's interested in regulating this. And I think a lot of the companies don't really want it to be regulated either. But on the flip side, I will just give one shout out to the CEO at Anthropic, who on the other hand is raising some alarms. He's recently spoke on like the Hard Fork podcast about concerns for biochemical type capabilities within these models and you know, wet lab testing that they're doing to determine like, are these things truly capable of helping someone create a bio weapon? And yeah, those are definitely scary things that, you know, don't feel too real, but if the CEO at Anthropic is saying, yeah, we're probably 6 to 12 months out from having a model that could do that, you know, I sort of want to believe them.
Charlie McCarthy (38:36):
Right.
Ethan Silvas (38:37):
I think it was kind of funny, I'm pretty sure the ChatGPT system card, the most recent one kind of mentioned the bio weapon and stuff, but it was just kind of like, eh, you can maybe.
Charlie McCarthy (38:48):
Ah! All right.
Ethan Silvas (38:49):
So it's cool that the Anthropic CEO is actually saying, no, it's a problem that we should think about, so.
Charlie McCarthy (38:54):
Yeah, raising an alarm.
Keith Hoodlet (38:56):
If you connect the prompt injection and jailbreaking community to bio weapon harm potential, like these things are not farfetched to actually, you know, a reasonably patient attacker who does a little bit of research could accomplish these things and cause great harm. So we'll see if any of the current administration feelings toward this or company's feelings toward this change if we end up in a situation where we find bio weapons are actually created as a result and used in some harmful way, but at least it seems like right now in the United States, that regulation is probably the least worry that a lot of these AI companies have. And so from a security team perspective, lean on the laws you already have. GDPR, just, you know, good business practices, et cetera.
Charlie McCarthy (39:42):
We are closing in on time. Keith, any key takeaways that you want this particular audience to leave with? Or anything that we didn't touch on that you wanted to make sure we included in this particular episode?
Keith Hoodlet (39:56):
I think the big thing that I sort of hit on a little bit that I go back to a lot with clients and anyone I talk to in this space is doing the basics right is actually more important than ever when it comes to large language models. So having good data classification, logging, monitoring and alerting of the way that these systems are being used is the most surefire way to at least protect yourself or get early alerting to a problem taking place. And so that is a big and important thing that I think a lot of companies should be focusing on. But other than that check out some of the work that my team and I are up to over at trailofbits.com. We have a blog. There's also my personal blog securing.dev where I occasionally publish things. And lately I've been reading a lot more research papers that I just feel like are wrong on the internet. So I am probably gonna be doing some more publications in the near future about that.
Charlie McCarthy (40:44):
Very good. All right. Yeah. And we'll anyone tuning in, we will share links to all of these resources in the notes at mlsecops.com/podcast. Keith, such a pleasure to talk with you again, thanks for being here. Ethan, thanks for being here. We will see everybody next time.
[Closing]
Additional tools and resources to check out:
Protect AI Guardian: Zero Trust for ML Model
Recon: Automated Red Teaming for GenAI
Protect AI’s ML Security-Focused Open Source Tools
LLM Guard Open Source Security Toolkit for LLM Interactions
Huntr - The World's First AI/Machine Learning Bug Bounty Platform
Thanks for checking out the MLSecOps Podcast! Get involved with the MLSecOps Community and find more resources at https://community.mlsecops.com/.