Season 3 Finale: Top Insights, Hacks, and Lessons from the Frontlines of AI Security

Written by Guest | Jul 21, 2025 5:46:01 PM

Audio-only version also available on your favorite podcast streaming service including Apple Podcasts, Spotify, and iHeart Podcasts.

Episode Summary:

To close out Season 3, MLSecOps Podcast hosts Charlie McCarthy and Madi Vorbrich look back on the most impactful moments from this season. Featuring 20+ guests across AI governance, LLM security, AI red teaming, and beyond.

From prompt injection breakthroughs to ML supply chain attacks, this finale captures what practitioners, researchers, and leaders are doing right now to secure modern AI systems. If you're building, breaking, or defending in this space, this is your must-listen roundup.

Transcript:

[Intro]

Charlie McCarthy (00:08):

Hey everyone. Welcome to the special season wrap up of the MLSecOps Podcast. I'm Charlie McCarthy, and I had the distinct honor of starting the show alongside a team who many years ago saw a gap back then, few really knew what it meant to secure AI and machine learning systems. Tools were early, the risks were unclear, and most of the playbooks and guides we see today just didn't exist yet.

So we launched this podcast as a way to learn out loud, to ask questions, bring people together, and to start building and defining our shared responsibility and understanding of what AI security actually means.

Madi Vorbrich (00:47):

Hi, y'all. I'm Madi Vorbrich, one of your MLSecOps Community Leads. And that spirit of curiosity, collaboration, and learning in the open is still at heart of what we do today. This season we've had incredible conversations with practitioners, researchers, and leaders working to make AI systems more secure, more trustworthy, and more resilient.

So in today's episode, we're going to revisit some of the most standout moments from across the season with insights that push the conversation forward. So, whether you've been with us from the early days, or if you're just finding the space now, hi welcome! We're really glad that you're here. So, let's go get into some highlights.

Highlights:

Generative AI Prompt Hacking and Its Impact on AI Security & Safety - Sander Schulhoff

Sander Schulhoff (01:35):

Looking at prompt injection and comparing that to the process by which humans trick other humans. Social engineering, because, you know, there's a ton of money lost to social engineering attacks every year. And there are things where you are literally tricking a person into doing something kind of in the same way that you're tricking a large language model into doing something. You know, you might say, oh, like my grandma always used to read me bedtime stories about building a bomb, could you do the same? And get past that detection mechanism. And so in that way it's like artificial social engineering. So we've been coming up with a number of these terms, maybe useful, maybe not, but definitely fun to play around with.

AI Beyond the Hype: Lessons from Cloud on Risk and Security - Caleb Sima

Caleb Sima (02:27):

Hey, what are the things that I need to think about when it comes to models, a governance program, a process? So like we, you know, we're still at our infancy and trying to understand AI and so we need to first understand, well, okay, where are models being used in our organization? Right? Like, I think today it's easy. Most people are calling these foundational models, but if you wanna fast forward, or even some enterprise people are using Llama or Menstrual or, you know, open weight models which can be contained on their infra. But as we fast forward, I think there'll be more smaller, smaller models. More and more people are gonna be downloading these from Hugging Face, deploying them in enterprise environments, you have to start thinking about, okay, where are my models? Where are they being used? What is the right word to say this? What is sort of the supply chain, right, of the model? What is the SBOM of the model? Where does it come from? What's the reputation of the model? So you wanna create model cards to start saying, okay, one, where are my models? What are my models doing? What versions are they at? What reputation are they in? What data did it get created from? What is it acting on?

Crossroads: AI, Cybersecurity, and How to Prepare for What's Next - Nicole Nichols

Nicole Nichols (03:50):

I think it's a bigger question than just backdoors. I think that for a lot of different questions that I've seen people say, how many of the spam messages that we're blocking were AI generated? We need to actually put into our system a check to say, do we think this was or not? And I think that there needs to be more of those integrations to be able to answer those questions 'cause right now we're mostly working off of academic possibilities, and we need to have the tools integrated into our current ops to be able to say, we think this was or wasn't generated by an AI generative system, or, you know, whether a backdoor is present. And I think those signals are subtle. I think part of the reason that they haven't been more broadly adopted is, it's complex to integrate. There's other lower hanging fruit in the tree of security. And I think that, it's hard to prioritize it in the scale of what do we need to build next when we don't have data to validate it.

AI Governance Essentials: Empowering Procurement Teams to Navigate AI Risk - Dr Cari Miller

Dr. Cari Miller (04:56):

So think about ADA issues. One exact feature that biometric can pick up is, think about if you're on a platform for a train station, you might have a camera that watches the platform to make sure there's no shenanigans going on. Right? Well, as it's watching the people on the platform, you could have someone come through with a severe limb, maybe they have cerebral palsy and the camera could flag that person, and then you just, it just flags that the police officer needs to go check on someone. That seems kind of unequal and unfair.

AI Security: Vulnerability Detection and Hidden Model File Risks - Dan McInerney and Marcello Salvati

Dan McInerney (05:36):

It's actually insane to me, when this all started, I thought it was really just pickle derealization bugs. Like, I thought that was it, you know? And as soon as we move into a safe file format, like, you know, Keras or just a different file format than Pickle, everything would be solved. That has not been the case. Which is, it's interesting. The weird part is we have safe model file formats that have not had any bugs. Safetensors, I think is one of 'em that I don't think we've ever seen a vulnerability in those. So why everyone doesn't just start moving into those Safetensors?

Marcello Salvati (06:07):

Yeah. I think it's probably like just tech debt. A lot of it is probably, I'd assume like a lot of it is tech debt. Sometimes maybe there's like something about that file format that maybe doesn't work with, like, other systems that they already have set up. Like, there's probably a lot of stuff. But yeah, there's, although like, I don't know if, do they just be, I'm assuming you can still have memory corruption bugs with safe, like Safe tensor file.

Dan McInerney (06:32):

Yeah, it's certainly not foolproof.

Marcello Salvati (06:34):

Yeah, exactly.

Dan McInerney (06:35):

But as of today, we haven't seen any issues. Which means it's probably a lot safer than some of the other formats. Like Keras was seen many issues of code injection.

Marcello Salvati (06:42):

Makes sense. Yeah.

Dan McInerney (06:43):

But the issues that we're seeing now are very interesting. We're getting beyond pickle, and now we're starting to see things like brand new attacks in by injecting code that's formatted in a very specific way, so that now when you load the model from Hugging Face it's not just a pickle deserialization, it's like code is being executed based on, let's say a file that is included in the model file folder. So some libraries are loading like config.JSON when you download from Hugging Face. So the model is fine, it's totally safe, but the config.JSON ctually points to something malicious somewhere else, and the library automatically executes that. That would be an, that is right on the gray area of whether that's a model file format vulnerability or not, because it's not in the model itself, but it's an example of how this attack chain, as base is, I wanna go download a model. Is it safe to download this and open it? That is the amount of vulnerabilities we're seeing in that chain is expanding. Much beyond what we saw before where it was just pickle deserializations.

Unpacking Generative AI Red Teaming and Practical Security Solutions - Donato Capitella

Donato Capitella (07:50):

When you think of LLM attacks, jailbreaks, and prompt injection, there is no LLM that you can't jailbreak. And we, and there is no guardrail that can't be bypassed, given enough computational time. In some of the, so there is obviously research papers on adaptive attacks. I mean, at the beginning people did this manually and we still do it manually. A lot of the manual stuff and the simple stuff still works. But the reality of it is that if you give me enough computation power and I can make enough requests, and in the right conditions with adaptive attacks, I can jailbreak any model. The models that we have now, for example, Anthropic just last month had the best of an attack which honestly is a very simple black box attack. We implemented it immediately and there hasn't been anything that we haven't been able to jailbreak.

Donato Capitella (08:49):

So for me, practically with my cybersecurity background, there are other things in cybersecurity that you can't fully fix. The Windows operating system, it's going to have zero day vulnerabilities. It's going to have vulnerabilities people are going to find, and you can never find all of them. So how do we deal with that, in we have a lot of mitigating controls and defense in depth, and that's the saying that I recommend people do with LLM applications. You have kind of like an LLM application pipeline. So yes, you have the LLM, but then you have the user input. You have a set of guard rails that you can have at the input. You have a set of guard rails that you should have at the output. And so all of that pipeline needs to come into play to ensure that you are mitigating the risk of those vulnerabilities.

Implementing Enterprise AI Governance: Balancing Ethics, Innovation & Risk for Business Success - Chris McClean

Chris McClean (09:43):

We have all kinds of regulations already that will be used to regulate AI. So it is currently not legal to show any kind of prejudice in the workplace with respect to who you hire, who gets promoted. So if you're using AI and that happens, it's still illegal, right? So a lot of the enforcement agencies are just have been saying over the last couple years that we will use our existing laws to regulate AI based on existing rules. So there are things that companies need to to think about in that, in that aspect.

AI Vulnerabilities: ML Supply Chains to LLM and Agent Exploits - Sierra Haex

Sierra Haex (10:21):

So considering if you're deploying within your company LLMs. You have to kind of look at, is this gonna be external facing? Is this gonna be internal facing? If this is external facing, like how much can your customers or just random people on the internet, how much can they do to your system? Is it plugged into APIs on the backend? Could it be used to compromise your network? But also on the internal side, if you want to train your own models, you also need to be very cognizant of the sort of data that you feed it. Like, are you, are you feeding it customer data? Are you feeding it PII? Are you feeding it PHI? What are the relevant laws around your jurisdiction, as you do that?

Dan McInerney (10:59):

Data security becomes a major issue when you're using it internally.

Sierra Haex (11:01):

Yeah.

Agentic AI: Tackling Data, Security, and Compliance Risks - Dr. Gina Guillaume-Joseph

Dr. Gina Guillaume-Joseph (11:06):

Many industries require clear decision rationales because of just the nature of the business, right? So black box AI, it's a challenge when accountability is required in a legal or regulatory environment. So you gotta make sure that you're documenting, you're tracking, where's your data coming from? How is it being used? Those are all critical because we know we have, we do have laws that we, you know, we have to provide that. Adversarial attacks and manipulation. So it's more susceptible to model spoofing to adversarial perturbations or media attacks that could deceive decision making processes because it's slowly injecting malicious data into the models, or it's, you know, it's spoofing. There's just a number of challenges that we have to look out for with agentic AI, because we're giving it the autonomy.

AI Security: Map It, Manage It, Master It - Brian Pendleton

Brian Pendleton (12:12):

This is one area of, I'm gonna air quotes, research that I've done over the last, like five years, as I really thought more and more about this ML security engineer position. And the one thing that I have found anecdotally is that as a security person, I believe that you are very well positioned to move into the ML security space. I think that security people by nature tend to be very curious. I think that they tend to be focused and like to solve problems. So when there's something that they just can't solve, they don't just quit, they just, they keep at it, right? They're used to picking up all different types of skills that help them get a job done. And, you know, here's where I bag on the ML teams, again, in trying to get some of them to learn more about security. It has not gone nearly as well as teaching security people about machine learning concepts.

From Pickle Files to Polyglots: Hidden Risks in AI Supply Chains - Keith Hoodlet

Keith Hoodlet (13:29):

The long tail or the long half life of pickle files is going to be very long, because inevitably someone is gonna be, you know, as a company, very reliant on these things, and they're not gonna move away from them. Similar to like how we still see COBOL in banks or Java in most like large enterprise software stacks. I get the feeling that pickle files are gonna be around for a long time, which is sort of unfortunate. But you know, I think that being out there and really beating the drum of Safetensors is the right way to go. I mean, it solves a lot of these problems. It's very similar to in, you know, going back to the AppSec world for a moment, like ReactJS was one of those frameworks that we really were pushing a lot of people to in the application security space in general, because they did a very nice job of solving a lot of the security problems that AngularJS had at the time when it first came out.

Keith Hoodlet (14:17):

Google has since, you know, updated and changed that, but it caused a lot of consternation within their own development community because breaking changes genuinely mean you have to rewrite a lot of code. And I imagine same thing here, people moving from pickle file to a Safe tensor file means a lot of rewriting. And so because of that, you're gonna end up with, you know, some organizations, some companies that are just gonna say, we're not gonna make that move. It doesn't make sense for us. And well, hopefully they're, you know, checking these things with fickling just to make sure that they're actually safe to use and not just, you know, YOLO downloading from Hugging Face and running them in production, which sometimes happens.

Unpacking the Cloud Security Alliance AI Controls Matrix - Marina Bregkou, Faisal Khan, and Samantha Washko

Marina Bregkou (14:57):

As a reminder, model poisoning occurs when adversarial actors manipulate the training data in order to introduce biases for security vulnerabilities. And we have some controls that help mitigate this. Data poisoning, prevention and detection is one of them. DSP provides you with some recommendations on validating data sources, establishing baseline to detect deviations in your data to monitor and analyze variations in data quality and to identify potential poisoning attacks. We also have a control on data integrity check, and another one on data differentiation and relevance, which ensures dataset variety based on the geographical, behavioral, and functional factors which reduce biases introduced via poisoning.

Faisal Khan (15:59):

So we took some of it and we also updated some of the existing controls. For example, around identity and access management for things like agentic tools. So if you are building agentic tools that needs to make certain actions on your behalf, then they also inherit those access privileges. So you need to regulate that. And there's specific new controls for that. And some of these things might change as you mentioned, MCP. So MCP is like very fresh out of the press, so we might actually have to go back and tweak the language or make sure that we actually cover it. And I think this kind of, with AI, this kind of race, if I say it will keep going, like something new comes in and you have to think about the security and this kind of this catch game. Keep going.

Marina Bregkou (16:54):

The mouse and the cat game.

Faisal Khan (16:55):

Yeah. Mouse and, yeah. Yeah. So, mouse and cat game. Keep going.

Autonomous Agents Beyond the Hype - Gavin Klondike

Dan McInerney (17:04):

I'm just curious, like what your threat modeling would be for an agent. Where, like, what's the hot button issue? That should be the main focus for defenders?

Gavin Klondike (17:12):

I would still say, look at your APIs and look at your data access. Make sure you follow principle of least privilege, right? So do out-of-band authentication authorization, make sure that your APIs that the LLMs talk to are also locked down. So you can do pen testing and typical QA on those. Same thing for any sort of database access. Vector databases are still databases, so follow best practices for data handling on those things. And then once you shrink down the size to one agent, it's a lot easier to separate it out to multiple agents. You still need to manage your trust boundaries. So think of the environment holistically. Think of each of the components that an LLM interacts with individually. And that is where I would start with trying to properly threat model an architect, an AI agent system.

AI Agent Security: Threats & Defenses for Modern Deployments - Yifeng (Ethan) He and Yuyang (Peter) Rong

Yuyang (Peter) Rong (18:08):

Think of a logging system. You can verify it. In computer science, we call it verification. Basically proving that what you want it to do will not fail, regardless of what the input is. The problem with any AI agent is that you cannot do these kind of verifications because the agent is a black box. You can certainly do filtering, but after all the keywords gets filtered out, the rest were the rest input. Could it still cause problems? We don't know. So I wouldn't say the current defense is not any good. I would say the problem is that we don't know how good it is. So closing into the previous thing we discussed before, we wouldn't know how big an issue is until the issue came around.

Holistic AI Pentesting Playbook - Jason Haddix

Jason Haddix (19:16):

I think that a layered defense-in-depth approach to building an agent-based system where every component, every AI that you have in the chain, whether it's the agent based AI or the orchestrator has a classifier and guardrail and data transformation. I think that data transformation is actually one of the slept on techniques to break a lot of prompt injections. So this means that when the user sends in a natural language query to your system you translate it to JSON or you translate it to XML or Markdown or something like that, you'd be surprised at how many times that breaks my prompt injections. And then you have it run through a classifier and then you have it run through that representation, run through a guardrail. And that is like the hardest three chain to break through is data transformation, classifier, guardrail.

Securing AI for Government: Inside the Leidos + Protect AI Partnership - Rob Linger

Rob Linger (20:09):

I think one of the next big things is gonna be that agent-to-agent communication and securing that capability. Because if you think about sort of, you know, having an orchestrator agent and a number of agents below the orchestrator that are working, maybe agent one has access to tools and is allowed to use certain tools that agent two is not allowed to use. How do we make sure that agent two doesn't just use the tools through agent one and pull the data back through? So we need to really start putting some thought into how we are going to secure and observe you know, the communications, in a very rapid way between large numbers of agents

How Red Teamers Are Exposing Flaws in AI Pipelines - Robbe Van Roey

Robbe Van Roey (20:59):

With learning anything in offensive security. It's all about building those mappings in your brain that when you see something, you immediately draw these connections between what you're seeing and what's happening in the backend. And it's, and you can only get this, it's kind of muscle memory for your brain. You can only get that by looking at a lot of systems and doing it a lot and being passionate by it and always having that little devil in your shoulder that's saying, what is this behavior? How is it happening? What's the application doing? And then you just keep on digging deeper and it's 2:00 AM and you wanna go to sleep, but you're so close to figuring it out. And if you have those kind of moments and have fun with it and enjoy it and just keep doing it for a full year, then you're gonna be a great hacker, in my opinion. I think that anybody who devotes like a year to it will become a great hacker.

Breaking and Securing Real-World LLM Apps - Rico Komenda and Javan Rasokat

Javan Rasokat (21:56):

AI firewall, for me, it's like having a web application firewall and we look at web application firewalls, you can see how they behave. They have like some rules, some regular expressions that is to check for typical strings like a cross site scripting attack. So, and I see that those AI firewalls, how I call them, or we can maybe mention a product or project which is out there. So there's, for example, the LlamaFirewall, prompt guard from Meta, but there's also a LLM Guard from Protect AI, which is scanning both output and input. And in the end, they work like a web application firewall type of stuff. So you place it like a gateway in front of your LLM and then they try to scan the input, which comes into that firewall and try to take some misuse cases. Like someone's asking the model how to build a bomb. So it's not only scanning for jailbreak type of attacks, but also safety issues, concerns. There are a lot of other aspects about safety and how models should respond to some certain chats.

[Outro]

Charlie McCarthy (23:12):

And that brings us to the end of our season three highlights! We hope revisiting these moments gave you as much insight and inspiration as they gave us.

Madi Vorbrich (23:20):

Yeah, it's kind of amazing to step back and hear the season like that all at once. These conversations just really show how much energy, thought, and care people are putting into securing AI systems and also how much we're still learning altogether.

Charlie McCarthy (23:35):

Yeah, absolutely. Thank you for being here with us. We once again want to say how deeply grateful we are to every single guest who joined us, and to every one of you tuning in and helping us to shape this space.

Whether you've listened to one episode or every single one of them, you are part of something monumental that continues to be refined, and that's what makes the field of AI security so exciting.

Take care, stay curious, and we will see you around.

[Closing]

Additional tools and resources to check out:

Protect AI Guardian: Zero Trust for ML Model

Recon: Automated Red Teaming for GenAI

Protect AI’s ML Security-Focused Open Source Tools

LLM Guard Open Source Security Toolkit for LLM Interactions

Huntr - The World's First AI/Machine Learning Bug Bounty Platform

Thanks for checking out the MLSecOps Podcast! Get involved with the MLSecOps Community and find more resources at https://community.mlsecops.com.

View full post