<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=4373740&amp;fmt=gif">
MLSecOps-favicon PAI-favicon-120423 icon3

Practical Offensive and Adversarial ML for Red Teams



Audio-only version also available on Apple Podcasts, Google Podcasts, Spotify, iHeart Podcasts, and many more.

Episode Summary:

Next on the MLSecOps Podcast, we have the honor of highlighting one of our MLSecOps Community members and Dropbox Red Teamers, Adrian Wood.

Adrian joined Protect AI threat researchers, Dan McInerney and Marcello Salvati, in the studio to share an array of insights, including what inspired him to create the Offensive ML (aka OffSec ML) Playbook, and diving into categories like adversarial machine learning (ML), offensive/defensive ML, and supply chain attacks.

The group also discusses dual uses for "traditional" ML and LLMs in the realm of security, the rise of agentic LLMs, and the potential for crown jewel data leakage via model malware (i.e. highly valuable and sensitive data being leaked out of an organization due to malicious software embedded within machine learning models or AI systems).


[Intro] 00:00

Dan McInerney 00:07

Welcome to this episode of the MLSecOps Podcast. We have a guest here with Adrian Wood, the writer of the OffSec ML Playbook. Hosting this event is going to be me, Dan McInerney, the Threat Researcher at Protect AI and Marcello Salvati, Threat Researcher at Protect AI.

Marcello Salvati 00:24

Hello, everyone.

Dan McInerney 00:24

Adrian, if you'd like to give yourself a little introduction.

Adrian Wood 00:27

G'day. It's nice to meet you both and to be here. I'm Adrian Wood, some people call me threlfall. I work on the Red Team at Dropbox. Prior to that, I was an engineer at Wells Fargo, and before that I worked for eight years as a founder and lead engineer at a company in Australia called White Hack, doing primarily red team assessments.

Dan McInerney 00:51


Marcello Salvati 00:52


Dan McInerney 00:52

So, as our guest today, I think one of the things we wanted to talk about was a really cool project that you did, where it's called the Wiki OffSec Playbook. So can you explain a little bit about what this is and who the audience is for?

Adrian Wood 01:07

Yeah, so the OffSec Playbook is designed primarily with red team operators in mind, folks like myself who sometimes find themselves in a position on a network with access to a thing, and you want to know what to do next. But in the case of machine learning, machine learning pipelines, sometimes that's not super obvious. And there's a number of reasons for that, right, such as adversarial ML, and the world of adversarial ML is full of a lot of academic papers, some of which have code, some of which do not. Many will, many use words that red team operators use, but they mean a different thing. And I'm just trying to like deal with that problem first and foremost.

Dan McInerney 02:00


Marcello Salvati 02:01


Dan McInerney 02:01

There's so much, there's academic papers. I think we've spoken about this before, is they, sometimes they drive me insane.

Marcello Salvati 02:07

Yeah. The definition of red teaming in itself is kind of now all over the place between like, you know, there's a big difference between like what we would consider a red team and what an academic like red team, an academic paper would consider like an AI red team.

Dan McInerney 02:22

A lot of times in the security world we're like, we'd like to red team something that means, you know, I get to pick locks, I get to break in, I get to phish and stuff. And then these academic papers are like, I red teamed the model. I'm like, I don't think I picked a lock to get into this model.

Marcello Salvati 02:34

Yeah. Which is funny because like, yeah. I mean this is, it's sort of a joke in the industry where it's like the red - like prior to this, the definition of a red team changed between consultancy companies and, and like, you know, have each person in each company has its own definition. Now it's even - the waters are even muddier. So it's very interesting.

Dan McInerney 02:53

Yeah. And I love this, I love this, this playbook because it does take a lot of these academic papers and concepts that should have just been a blog post and a GitHub release instead of 17 pages of like dense academic stuff. So it's really nice to see this kind of condensed down.

Marcello Salvati 03:11

Yeah, especially from a practical standpoint that, that's because I, I haven't done red team in like a couple years at this point, but like from a practical standpoint like this, this is really awesome.

Dan McInerney 03:19

Yeah. So why'd you like kick this off? I mean, what inspired you to make this to begin with?

Adrian Wood 03:24

So I've been doing ML things whilst red teaming or doing other things in, in that sphere for probably more than eight years now, but last year I went to a conference called CAMLIS in [Washington] DC; the Conference for Applied Machine Learning in Information Security, which is an amazing conference. If you ever get a chance to go, go. Like, go buy a ticket right now, it's in October. I hope to see you there.

And I sat there for two days listening to all of these presentations from people in industry or in academia, like releasing applied practical things for InfoSec. But I think across the two days there was only one talk that was an offensive use case of machine learning. The rest of it was for like hunting hackers, finding bad people. And I was just like, that's interesting. And I sort of sat there feeling like there was this like big wave of machine learning come down on, you know, red teamers that had kind of, for the most part doing the same thing they'd been doing, right.

Dan McInerney 04:33

What year was this?

Adrian Wood 04:35

Last year.

Dan McInerney 04:36

Last year. Okay.

Adrian Wood 04:37

2023. Yep.

Dan McInerney 04:39


Adrian Wood 04:40

And so I'm sitting there, I'm feeling kind of like overwhelmed. And then I started thinking about the way that my colleagues were talking about success in operations and how that had changed from, say like 2018 until now. Back in 2018, you'd hear a lot of red teamers say things like, I got in and out, totally clean, hit all the objectives, no worries.

And more recently I was hearing things even from the same people working at the same companies, you know, their familiarization with the organization that they work for has gone up, but they're getting caught more than they used to be, which is usually the opposite of what happens. And it was coming down to basically like the mean time to detection was falling very quickly and they were often being caught by machine learning. So I was like, we gotta use more machine learning, adversarially offensively. We gotta get more people tooled up in this space. So that's kind of how it got started.

Dan McInerney 05:40

What was the, what was your machine learning background? Because you said you were like dabbling in the machine learning world before you were even doing security stuff.

Adrian Wood 05:48

Oh, yeah. So it was more dabbling in machine learning in security. So working with, like, trying to do behavioral analysis in say, like supply chain security in particular was one of the earlier projects that I did trying to uncover anomalous commits in public Git repos of things that I cared about. I applied similar logic to, with sentiment analysis, to detect trading bots trying to force swings in the crypto market and things like that as well, just as like a side hobby.

And then more recently I've been doing like anomaly detection of EDRs (Endpoint Detection and Response), so benchmarking an EDRs behavior on a system using say just even simple unsupervised learning to see what system calls it's making to identify like any changes in network traffic, so that when you are in the middle of an op doing things, you've kind of got this like little canary in the coal mine that tells you when the EDR's like woken up and has started to care, just as a way to sort of know when things are not going so great.

Dan McInerney 06:57

So I'm a, I'm a big ML nerd. What were some of the models you used, just as a sidetrack here?

Adrian Wood 07:03

Yeah, yeah, so I was, for sentiment analysis, I was doing a lot with an open source project called VADER, which at the time was like state of the art for sentiment analysis because it could assign, it had some understanding of like emojis and their meaning in like current culture back in the day. And that was the benchmark that I was using for that. And then for like bot detection, I actually can't remember the name of the project, but it was like another Git repo thing. You know, I find generally I, as you can kind of tell from the playbook, is I glue a lot of stuff together.

Dan McInerney 07:42

You're speaking my language. My whole career. Oh, I like this tool and I like this tool. Let's just make these two tools work together asynchronously in Python.

Marcello Salvati 07:50

Yeah. I feel like that's a story of most offensive tooling anyway. It's just like, hey, I just glued this stuff together just to get it working.

Dan McInerney 07:57

Now it works in one command instead of seven.

Adrian Wood 08:01

Yeah, because normally you need a thing and you need it right now, so.

Marcello Salvati 08:03

Yeah, exactly, yeah. It's on the job tooling is, is pretty much where like most of the stuff is made for sure. Yeah, it's funny that you mention that because I feel like the defenders to an extent, and like defense - in defensive products to that point, have been using at least forms of ML for quite a while, I feel like in a lot of products. And the offensive community has actually been slow to sort of catch up on this.

Dan McInerney 08:28


Marcello Salvati 08:28

To an extent. And which I kind of feel like is sort of a, it's sort of a trend, I think I feel like a good, like things go in waves in the offensive community, I feel like, so.

Dan McInerney 08:38

Yeah, I feel like it's like the more you see it on the engagements, the more familiar you are. So like yeah. You know, I don't know anything about mainframes, and yet when we go do bank jobs and stuff, there's oftentimes a lot of mainframes there, and suddenly you became a, become a mainframe expert because of that two week engagement. And I think that's where we're at now is people are starting to see these web applications and tools on their engagements, and they're like, what does this do? Oh, it's ML.

Marcello Salvati 08:59

Yeah. I guess, I guess what I was like, what I was referring to, it's like, because it's not something, because like it's been around for a while in like EDRs and a bunch of like network analysis tools, I guess, right? Like, there's a bunch of products, like I know just, you know, you can censor this out later, but like, not to plug this company, but I know for a fact like Darktrace and a bunch of like all of these like, products use this kind of stuff. And so I guess it's like not in your face. It's not like, hey, this is like an ML thing. So I guess people haven't really like dug deep into it in the offensive community.

But yeah, to your point, like the minute you start seeing like MLflow or H2O, you know, rolling around on engagements in networks, like you, you tend to pick - I like, I feel like at that point, like you pick up on it real fast.

Adrian Wood 09:41

I think ML also has a image problem within the offensive security space.

Marcello Salvati 09:48

For sure.

Adrian Wood 09:48

I think because there are many, many companies that were very early adopters of ML for security, probably oversold the capabilities of the projects in their early days.

Marcello Salvati 10:01

Oh, for sure.

Adrian Wood 10:02

And those things when they faced real world encounters with red teamers and others just kind of didn't do so great. And I think a lot of people formed a lot of quite firm opinions at that point about the efficacy of ML.

Dan McInerney 10:16

Hundred percent.

Marcello Salvati 10:16

Oh yeah.

Dan McInerney 10:17

Hundred percent.

Adrian Wood 10:17

That they now need to walk back and they need to walk it back like last year or earlier.

Marcello Salvati 10:22

Yeah. Correct. Yeah. Yeah. It was definitely, yeah. Yeah. Go ahead.

Dan McInerney 10:24

We see that a lot with LLMs too. Like LLMs first came out and people were like, well, it doesn't even do basic math. And then, you know, the next generation gets a little, like, quite a bit better at it. Suddenly people are like, oh, oh, okay. Well maybe this isn't as bad as I thought. Like, I feel like everything has to be black and white when you first see it. And there's just so much gray area.

Marcello Salvati 10:40

Yeah, there's a lot of opinions that, that don't, like they change very slowly over time. Like <laugh>. Yeah. They don't, yeah, they don't like it. It sometimes we as the defensive community, I feel like, doesn't like to face the facts that, you know, things have changed and yeah. Yeah, so like, can you dive into a bit like each category in your OffSec ML Playbook and tool in Wiki here? Yeah. Just to give us an idea.

Dan McInerney 11:07


Adrian Wood 11:07

Yeah. The adversarial category right now is primarily looking at things to do with LLMs. But I'm adding more things all the time there. And what we're talking about there is attacks against ML systems and that category is broken down basically by the level of access that you have to have to perform that attack, whether that's API access or, you know, like direct model access and so on. And I've noticed that people are starting to use this in threat modeling too, this category, because the - a team will come to them and say, hey, we want to stick embeddings in this location for like LLM RAG systems; is that okay? And people will say like, sure. Because they don't realize that like embeddings can be fully reversed, basically back to text, but with a project from a guy called Jack Morris, I believe, it's called like vec2text, right? Like, so you can use the, the category to basically either attack something or figure out how someone could attack it for those purposes.

The next category is Offensive ML, which is where we're talking about using ML or leveraging ML to attack something else. So that could be, as an example, there's a great project in there, one of my favorites by Biagio Montaruli which is a phishing webpage generator. So it deploys a local phishing webpage detection model and then generates HTML attributes into your phishing webpage until the malware, sorry, until the phishing confidence goes close to zero, and then you use that version. And I'm getting feedback from people all over the world that are using that project that Biagio made and saying like, it works, it's good. I use it myself. It works, it's good. Same thing goes for, say like droppers, you know, using a dropper to - an ML enabled dropper - to make decisions about whether to drop, right? Because you don't wanna drop into a sandbox.

Supply chain attacks, that's got a lot of stuff in there from both of you, actually from Protect AI.

Dan McInerney 13:20

Yeah, let's go!

Marcello Salvati 13:20

Pat ourselves on the back there. <Laughs>

Adrian Wood 13:23

There you go, there you go. So using the ML supply chain, you mentioned things before, like H2O Flow, MLflow, all those things are in there, but also ways of using the models to attack ML pipelines and more. As well as data based attacks and so on.

I figured I'd better add a defensive category at that point because I was like, this site's just getting a little rude <laugh>. So I've added a defensive category. It's a work in progress and mostly because I don't like adding things to the site that I haven't personally used. So there could be a bit of a backlog there while I like get around seeing is this thing worth [it], basically. And the defensive system is tracking a lot of benchmarks, which I'm very interested in benchmarks, especially of LLMs. And a lot of other use cases like tools for putting the data inside a binary into a ML ready state. There's an approach called bin2ml by a guy called Josh Collyer in the UK, which is an amazing tool if you're trying to build out like malware data sets and things. Yeah. So that's sort of how all of that's coming together.

Dan McInerney 14:36

Yeah I see LLMs is being really, really, really good at taking huge amounts of data and then distilling it down to a reproducible form, you know, like a JSON file or something.

Marcello Salvati 14:47


Dan McInerney 14:47

And I feel like log analysis is just one of the best places ever for LLMs to start doing this work. Yeah. But I think there's a lot of room still that for predictive machine learning or traditional machine learning, where I think people kind of forget. Once LLMs came around, they're like, well, forget about XGBoost and PyTorch and neural networks and stuff because we'll just use an LLM for everything. But I, I think that's a mistake right now. I think for right now, things like log analysis, I suspect predictive ML might be a little bit better at analyzing those logs than an LLM.

Marcello Salvati 15:20

I can see that.

Adrian Wood 15:20

There is a great contribution made recently by Dr. X, who, she's based in Charleston, South Carolina, and she's added a benchmark comparing for log analysis of network traffic. Traditional predictive ML versus LLMs on the same data sets.

Dan McInerney 15:45


Adrian Wood 15:46

Which is not something that you see done often enough in a benchmark.

Dan McInerney 15:50

No, you don't.

Adrian Wood 15:50

Should I use this or should I use that? I'm like, this is great. Like this is, this is my new favorite benchmark, so thank you Dr. X, who has also been contributing heavily.

Dan McInerney 15:59

What was the conclusion?

Adrian Wood 16:02

That traditional ML was still significantly better at like network traffic <inaudible>.

Dan McInerney 16:09

That's what I expected.

Marcello Salvati 16:09

That makes sense, yeah. I mean, I think like right now, like, like we're just hammers and everything to us is nails right now with LLMs. Like, it, it's this new shiny thing and, and we're just trying to apply a little bit everywhere. But, but yeah, I mean, it makes perfect sense that like ML would, traditional ML, would be better at this.

Dan McInerney 16:29

And it's like I, to defend that position, I do feel like LLMs will overtake predictive ML in almost all realms within a couple years. But right now I think people are still, they're sleeping on, you know, just using traditional PyTorch and the traditional predictive ML for use cases that they're really good at. They're much better at those narrow use cases for finding anomalies and that kind of thing than LLMs are.

Marcello Salvati 16:54

There's a lower barrier barrier to entry though with LLMs, which is partially why I think everybody's just like trying to, oh, I can use LLMs here, there, you know, everywhere.

Dan McInerney 17:03

And so as advice from someone who's done a lot of predictive ML stuff, the auto ML tools that are out there right now are really, really good. Yeah. So if you're choosing an LLM and your LLM's just not performing for your PCAP analysis, for your log analysis, go check out some of the auto ML stuff like AutoGluon, H2O, Vertex AI, Azure AI. Like, you literally just upload your data set and it just does everything for you and spits out a really good model.

Marcello Salvati 17:29

Yeah. What is your, what, what do you, in terms of like defensive capabilities, like what do you think are like some of the best open source projects that you've come across that like are really good with, you know, doing traffic analysis or malware detection or anything along those lines?

Adrian Wood 17:46

Hmm. Yeah. The malware detection stuff in defensive open source models is something I've been diving into a lot recently. Using EMBER and SoReL, which are starting to get a little long in the tooth. But people have stopped releasing like models that fulfill those capabilities for probably very good reason. But those are really great defensive tools to help you sort of get an understanding of how things like EDRs and even virus total to a degree is making decisions about files. Because apart from it being like good to help you run detections, a problem that we run up against a lot when it comes to incident response is we have these black box tools that find problems, and they find them on a Friday night when there's no one to talk to. And then you've kind of gotta figure out why is it mad about this file and can I go to bed or not? But they don't give you any information because they're scared that someone's gonna flip the model around on them. So you get these like hyper generic statements, like advanced ML detection on this file, but no reason given. Like, no, like even if it was a YARA rule, you get like little tidbits that tell you like, oh, okay, it's mad about an LD preload, or it's mad about X, Y, Z.

Dan McInerney 19:12

Sounds like a <inaudible> error message.

Adrian Wood 19:13

These ML systems, you get nothing. So I like these defensive models for a few reasons, is that they kind of like, I, I lean on a lot of the properties of transferability in these systems to help me sort of like figure out why are these other black box systems mad? And you can do this for defensive purposes and offensive purposes, obviously.

Dan McInerney 19:35

That's a really good idea. This saves a lot of time without using like anything super advanced either.

Marcello Salvati 19:40

Yeah. I think one of the, like you brought up a good point where it's like, I think a lot of, like, I think one of the biggest issues with doing this kind of stuff is like what you said, like the models can sort of be flipped around on them because like, you can potentially use that to like evade like all of the detect, like if there's something that, if there's like an endpoint that you can hit that just gives you, hey, this is bad or good, like you can use that to evade stuff too. So it's like it's it's, it's another, it's the whole dual use thing again. And I feel like with ML particularly, it might be like, almost like a different realm because like it's a lot more powerful than like traditional, it's like both for offense and defense, it's a lot more powerful. So it would definitely be interesting, I think, see how, how we're gonna handle that, like both defensive and offensively.

Dan McInerney 20:34

So, you do a lot of work with model files now, and I feel like this is very underrepresented. I saw that you did a talk of Black Hat Asia [2024] on like model artifacts and things, and here's something that I think our audience probably knows at this point, but I don't think the broader audience understands is that models can run code <laugh>. They're essentially viruses. So what was like the main point of, of the talk that you gave at Black Hat Asia?

Adrian Wood 20:59

Yeah, so I think the, the main talk, the main point number one was that models for the most part are fullblown programs. And that's bad <laugh>, as you said. The second component of our, of our main takeaway was the type of access you give an attacker when they use models as a malware vector is crazy good.

Dan McInerney 21:28


Adrian Wood 21:28

Like, it'll be one of the shortest assessments you've ever run to getting to crown jewel data if you use model-based malware. And in general, a lot of tooling that we can stick in a pipeline to make automated decisions about incoming files or even sandboxing during IR is like not super ready for model malware.

Dan McInerney 21:55

Yeah. Yeah. I found that too. The search on like antiviruses put the really obvious payload and a whole bunch of model files and none of the antivirus is caught. The base64 encoded Python exploit.

Marcello Salvati 22:07

Yeah. It was like interpreter right? Or something. Just basic reverse interpreter shell or something.

Dan McInerney 22:11

It was just a reverse shell, it was ridiculous. It's like, this seems really obvious.

Marcello Salvati 22:15


Dan McInerney 22:15

But they don't find it.

Adrian Wood 22:18

Yeah. When, when you dig into it as a lot of the time is like they're not, they're not even capable of like looking at the file type at all. Full stop. So you sort of, a lot of them are like not even getting off the ground. Then you've got other issues. Like the file size is a massive complicator of doing an AV check or sandboxing a thing, you know? Because an old school technique is just make your malware really, really big.

Marcello Salvati 22:46

Yeah. <Laugh>.

Adrian Wood 22:47

Well, like the models are just big by default and it's not even weird.

Marcello Salvati 22:50

<Laugh>. Yeah. I remember there were multiple times where I just like, like on purpose created like really, really big like malware payloads, and like the, the like AVs just don't scan it because they don't wanna affect performance of the end user, which is wild.

Dan McInerney 23:05

You said that there was really, really huge access when you get access to, when you use a model-based attack. Like why?

Adrian Wood 23:13

So as I understand it, and - so I did this attack in a red team but also in bug bounties because I wanted to know like, is this, can I make a generic statement here? Turns out I can. And that generic statement is that at most companies you see that ML engineers have to work a little differently to everyone else. In their development environments, they need production data to do their job. So they can often supplement with synthetic data or like fake data that is meant to look like real data. But at the end of the day, for some tasks, they just need the real deal. So they need that in special development environments that exist inside your business in like the ML pipeline. And what that means is that these ML pipelines have direct access through say like a S3 store or some sort of API call to something like Snowflake or Elastic or whatever, and just, they just have access to the crown jewel data because it's by necessity.

So when an attacker gets in this environment and has like a shell in this environment has, you know, like remote administration type access to this environment, they can just use all the tools for their intended purposes to carry out an attack. So they just look like another ML engineer in the pipeline doing things. And there's, you know, given that it's kind of treated like a development environment, you see a lot of like things - standards have, are not the same as you may expect in other elements of a production environment, even at like quite mature companies.

Marcello Salvati 24:51

Yeah. It's just like Living-Off-the-Land sort of technique applied to ML, which yeah and again, it kind of makes sense, for sure.

Dan McInerney 24:59

Yeah, because I mean, especially since the ML engineers, they need to use the cutting edge tools and the cutting edge tools often are cutting edge because they save time.

Marcello Salvati 25:06

They're experimental.

Dan McInerney 25:08

They're experimental, and they save time on test - security testing and stuff. And it's like, well no, like my manager needs me to get this project done. I need you to let me install, you know Triton or something and expose it to our colleagues over in, you know, Georgia or whatever.

Marcello Salvati 25:23

Whatever. Yeah. Yeah exactly.

Dan McInerney 25:24

I found that I, I found the exact same thing to be true. That people give ML engineers - there's actually a really good talk by Will (Schroeder) from SpecterOps on “I Hunt Sys Admins." It's like many years ago. And that was like a real turning point for hackers to start using PowerShell and Living-Off-the-Land and those kind of things.

Marcello Salvati 25:45

Yeah, that's true.

Dan McInerney 25:46

And that's 'cause he attacked sysadmins (system administrators), and sysadmins essentially have almost less power than the ML engineers today. Because the sysadmins might not have access to all the databases or whatever, but the ML engineers, they need access to all of these databases and all of those data sets and they have to put them in one place. It has to be one data set that you feed to this model.

Marcello Salvati 26:05

Yeah. And there's a direct monetary value tied to all of that which is a little bit different from sysadmins where it's like, yeah. I mean sure.

Dan McInerney 26:13

That's a good point. That's a really good point.

Marcello Salvati 26:14

You get credentials, you can ransomware the stuff. Yeah, sure. I mean, yeah. Okay. But, you know, we're not, we're not talking about like millions of dollars worth of like compute time to actually train models or, or like engineer time to clean data and stuff, all the ETL pipelines and stuff.

Dan McInerney 26:29

We're getting to the point where these models are gonna cost a billion dollars to train. At the highest end foundational models are gonna be a billion dollars, if they're not already. I think they might already be a billion if not more.

Marcello Salvati 26:38

Yeah. Well, I mean, yeah, a hundred percent.

Dan McInerney 26:40

It's crazy.

Dan McInerney 26:41

Yeah. So that's, that's definitely an interesting, that's definitely interesting.

Adrian Wood 26:44

Another aspect of all of these kinds of attacks in ML pipelines that encouraged me to make the playbook was, you know, a lot of adversarial ML tools, even if they call - some of them even call themselves a post exploitation toolkit or whatever - try and run one through your C2 (command and control) implant, see how you get on, right. Like the, the playbook is also an amalgam of things that I find work through C2 implants because there are a lot of limitations to using, say a lot of interactive type tooling via a C2 implant, especially on Linux, where like things just break a lot. And it's difficult to go, if you have a Linux shell, it's very hard to go interactive again within one. It, it can be quite flaky. It's one of the riskier things that you'll do. So sort of like finding the things like that, that actually just kind of work and won't burn the amazing new access you just found is really important. <Laugh>

Marcello Salvati 27:53

Yeah. Yeah. That's, yeah, that's true. Because like most of the, like C2 tooling or just red team capabilities are geared towards Windows environments. So like yeah, a hundred percent. Like there's definitely, like, that's my, my experience too, especially like Cobalt Strike and stuff, like it's a lot less mature for Linux than it is like for all the post exploitation capabilities. That's another area too, which is gonna be interesting, like to see how that evolves. Like I think right now the red teaming community is a little bit adverse to like LLMs and C2s just because of the whole like data retention problem and like the client's data and all, well, at least, at least in the consulting field. I mean, if you're an internal red team, maybe you don't really have to care that much, but like, it'll be interesting to see how that evolves too. I feel like not a lot of people are exploring the potential of like LLMs and C2s right now, which is something that I've been thinking about a lot, but it'll be very interesting to see like what you can do with that. And scary, too.

Adrian Wood 28:49

Yeah. I have so many words to say about that.

Dan McInerney 28:52

Please go on.

Marcello Salvati 28:53

Yeah, yeah. Like same. Yeah. This has been, yeah, [go on, please] feel free.

Adrian Wood 28:57

Right. So if we sort of just go back to some sort of like, start with some sort of like philosophical basis back to the original part of the conversation, I think that might be helpful; is that we spoke about a blue team wave of ML, right? We're talking about a flywheel spinning with a lot of energy, a lot of momentum, right? And it's gaining speed all the time. Why? Because blue team tooling is getting telemetry constantly from all the customer environments, not just about bad things, but more importantly about good things, about normal things, which is like a recurring problem in like InfoSec design of launching a new product, as I understand it. It's just having a good benchmark of benign data is not something you see very commonly. People like to release malware datasets, not like a dataset of nothing. <laugh>

Marcello Salvati 29:47

Yeah, yeah.

Dan McInerney 29:47

Here's some numbers. <laugh>

Adrian Wood 29:48

Yes. [If we] extrapolate that problem then to red team ML - you mentioned about like red teams - especially consulting red team MLs have a heck of a time putting together red team data sets to steer LLMs or to steer any kind of ML hack bots. Because what does most red teams do at the end of any engagement? Like what are they typically told to do? It's like, delete everything, keep nothing, burn all your telemetry down. Like, and if you do work somewhere a little bit more, you know, forward thinking or whatever it might be, they can keep some of that. Now you've got the problem, you've gotta clean this data somehow so that you can keep the good telemetry, but none of the customer information, and it is just quite a headache.

But if you can tackle that particular problem of red team telemetry, red team data, with something like - there's been a project released recently, which I think philosophically kind of fills this space, at least for my needs, called Nemesis which is a SpecterOps project actually by Will and Co., and it's sort of like working on this idea that, you know, if you want to build a good hack bot, you need good telemetry. So you have to keep it, you have to save it, you have to nurture it if you want to get anywhere. And you can really only beat ML with more, with, with your own ML because it's the only way to understand -

Dan McInerney 31:18

It's Terminator vs. Terminator. <Laugh>

Adrian Wood 31:20

Yeah, yeah. But the advantages, philosophically speaking, are that a red team flywheel is always gonna be smaller than a blue team flywheel. Small flywheels change axes quickly. They can navigate a complex landscape more quickly. So, you know, they're still gonna, that that asynchronous nature of attack and defense is still going to exist in the ML future but only for those who nurture their offensive data, basically.

Marcello Salvati 31:50

Yeah. That's, that's a very interesting problem because like, I honestly don't - like, there might be like two to three consultancies in the world that potentially have the bandwidth to actually do that, in my opinion. Like, because a lot of, especially boutique firm, because the problem is like really good red teaming firms or red teamers generally work at like boutique consultancies where it's like five people and like <laugh>, there's not, you're not gonna have a lot of bandwidth to, you're not gonna have a lot of bandwidth to like do like actual like, you know - get income and at the same time like actually be that forward thinking where you have to have like a whole ETL and an anonymization pipeline for all your customer data.

Dan McInerney 32:31

Yeah. So was there anything that you missed that we, that you wanted to talk about that we didn't get a chance to, to discuss yet?

Adrian Wood 32:37

Hmm. Yeah. I suppose we could talk a little bit about agentic LLM stuff if you wanted to, or maybe we could talk about things in the threat landscape like what you're seeing out there in terms of the activities of whoever you want to talk about.

Dan McInerney 32:53

The agentic LLMs I think are about to just shift the entire economy, I think in the coming years. So like what do you know, and, and what, what interesting tidbits do you, have you explored with the agentic LLMs?

Adrian Wood 33:08

I've looked at a lot of the LLM hack bots and regular hack bots over the years. And generally, like, they're kind of tricky to use because as a operator, most of the time that I am looking to use something like that, it's on a box that has a whole bunch of like, enterprise security on it. So you only get so many, call them, requests that you can make from, from say like shell access before, you know, you've blown it up, which makes using a hack bot - like in that capacity that I think people are envisioning - a little hard. They do kind of work okay as sort of a web app security tool because you don't really get punished for making a thousand requests to a website. But I would stress that people keep very soft opinions on these kinds of things because like the, the hack bots of last year to the hack bots of now are quite different.

Dan McInerney 34:05

A hundred percent. A hundred percent.

Adrian Wood 34:06

There's even one that came out this week that is - I haven't tried it yet, but I'm sort of waiting for some of the dust to settle around that - but it looks quite appealing. I think it's from Cornell University researchers, I want to say.

Dan McInerney 34:19

Was this the vulnerability discoverer?

Adrian Wood 34:21

Yeah. Have you, have you looked at that one?

Dan McInerney 34:23

Yeah. Yeah. I read this paper. So I've explored this, this topic quite a bit and so has Marcello.

Marcello Salvati 34:29


Dan McInerney 34:31

Yeah, you can essentially create a corporation out of LLM agents. You've got one that hunts XSS, one that hunts SQL injection, and you've got a manager and they just tell 'em, hey, look at all this code base, or look at this web application and just focus on your specialty. And through this few shot, prompt engineering examples with ReAct and chain-of-thought and tree-of-thought, you can really get surprising accuracy in vulnerability detections, and even for zero days which is really cool.

Marcello Salvati 34:55

Mm-hmmm <affirmative> yeah.

Dan McInerney 34:57

But I think we have to start wrapping up. I could talk to you about this stuff all day. I mean, there's a thousand more topics I wanna chat with you about, but <laugh>, it's only so much time in the world and I want to respect your time, too. So thank you so much Adrian, for coming and joining with us.

Adrian Wood 35:08

My pleasure.

Dan McInerney 35:09

That was a really fun discussion.

Marcello Salvati 35:10

Yeah, thank you so much.

Adrian Wood 35:11

You too.

Dan McInerney 35:12

Have a great evening.


Additional tools and resources to check out:

Protect AI Radar: End-to-End AI Risk Management

Protect AI’s ML Security-Focused Open Source Tools

LLM Guard - The Security Toolkit for LLM Interactions

Huntr - The World's First AI/Machine Learning Bug Bounty Platform

Thanks for listening! Find more episodes and transcripts at https://mlsecops.com/podcast.