The Evolved Adversarial Machine Learning Landscape

Apostol Vassilev, Research Team Supervisor, National Institute of Standards & Technology (NIST)

YouTube:

Audio Only:

In this episode, we explore the National Institute of Standards and Technology (NIST) white paper, Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations. The report is co-authored by our guest for this conversation; Apostol Vassilev, NIST Research Team Supervisor. Apostol provides insights into the motivations behind this initiative and the collaborative research methodology employed by the NIST team.

Apostol shares with us that this taxonomy and terminology report is part of the Trustworthy & Responsible AI Resource Center that NIST is developing.

Additional tools in the resource center include NIST’s AI Risk Management Framework (RMF), the OECD-NIST Catalogue of AI Tools and Metrics, and another crucial publication that Apostol co-authored called Towards a Standard for Identifying and Managing Bias in Artificial Intelligence.

The conversation then focuses on the evolution of adversarial ML (AdvML) attacks, including prominent techniques like prompt injection attacks, as well as other emerging threats amidst the rise of large language model applications. Apostol discusses the changing AI and computing infrastructure and the scale of defenses required as a result of these changes.

Concluding the episode, Apostol shares thoughts on enhancing ML security practices and invites stakeholders to contribute to the ongoing development of the AdvML taxonomy and terminology white paper.

Join us now for a thought-provoking discussion that sheds light on NIST's efforts to further define the terminology of adversarial ML and develop a comprehensive taxonomy of concepts that will aid industry leaders in creating additional standards and guides.

Transcription:

[Intro] 0:00

Charlie McCarthy 0:30

Welcome back to The MLSecOps Podcast! I am your co-host, Charlie McCarthy, along with D Dehghanpisheh and in this episode we explore the National Institute of Standards and Technology whitepaper, Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations.

This report is co-authored by our guest for this conversation, Apostol Vassilev. Apostol is a Research Team Supervisor at NIST and he provides insights into the motivations behind this initiative, and the collaborative research methodology employed by the NIST team. Apostol also shares with us that this taxonomy and terminology report is part of the Trustworthy & Responsible AI Resource Center that NIST is developing.

Later in the conversation, the group focuses on the evolution of adversarial machine learning attacks, including prominent techniques like prompt injection attacks, as well as other emerging threats amidst the rise of large language model applications. Apostol also discusses with us the changing AI and computing infrastructure, and the scale of defenses required as a result of these changes.

Closing out the episode, we talk with Apostol about enhancing ML security practices, and inviting stakeholders to contribute to the ongoing development of the NIST, AdvML taxonomy and terminology report.

Join us now for a thought-provoking discussion that sheds light on NIST’s efforts to further define the terminology of adversarial machine learning and develop a comprehensive taxonomy of concepts that will aid industry leaders in creating additional standards and guides.

This is The MLSecOps Podcast.

Apostol Vassilev 2:16

I had actually a long and meandering path before I joined NIST. After grad school, I went to industrial research, then wanted to try different things in my life, so I went into business, ran a laboratory, created it from scratch, and then was a lab director for a while.

And I was working with NIST, in fact, to verify the security and correctness of cryptographic implementations that companies want to embed in their products and sell to the federal government. By law, all this technology has to be verified for correctness and robustness before it can be sold to the government. I was running one of these laboratories and established the connection.

And so after a while then, I wanted to do something different, and an opportunity came along. NIST approached me and offered me a position. And I was thinking whether I should go back to doing something more research [-like]. And I had always a soft spot for doing fundamental research of some kind.

And so what was interesting is that I was contemplating this decision. I found out that some of the buddies that I was doing research with back in graduate school - I was part of a large Grand Challenge project sponsored by the Department of Energy at the time - and there was a consortium of my university, Texas A&M; University of Texas in Austin; Princeton; and a few national labs like Oak Ridge and Brookhaven Lab.

So we're collaborating on this project, and some of these researchers had found their ways into NIST as well. So I felt a bit like going to a reunion, and so that made another positive factor for my decision. So that's how I ended up there.

I joined the security division, and for the last five or six years, we've been looking at the astonishing developments around AI and how it, not just an emerging technology, but a technology that is on the cusp of affecting our lives in big ways. And so as such, we have a tendency at the security division to look at this technology and establish some kind of understanding about them, and inform the public in terms of potential problems and risks they may be encountering in using [the technology].

We've done this kind of thing for other prior emerging technologies such as IoT devices and all these things. And so this was kind of in the same vein. Last year I co-authored another report about bias in AI [Towards a Standard for Identifying and Managing Bias in Artificial Intelligence].

And so that made me a little target for my management to reach out and say, “Why don't you take over the development of this taxonomy of adversarial machine learning?”

D Dehghanpisheh 4:41

That kind of flows into what prompted NIST to create it beyond just, hey, it's a new emerging technical surface or a new emerging technical domain similar to IoT in the past?

Was there some real world activity or action that prompted NIST to say, hey, we really need to think through how we get this framework up and running?

Apostol Vassilev 5:01

Well, it was just the fact that AI technology, as I said, is now in its prime. It becomes much more than just an academic exercise of people tinkering with ideas, and developing prototypes, and publishing conference papers.

We see technology deployed already in certain sectors of the economies. In insurance claim processing, in finance, loan allocation, employment. There is a well understood situation today in the industry that most resumes that are submitted today for jobs are first processed by an AI or machine learning system before they then reselect it and hand it off to people to look at.

So that means that the technology has moved into the real business environment. It's become an important factor in our IT ecosystem in which we all live. And so that's a major factor for us to start looking into it. It's not, as I said, just an academic exercise anymore. It is a real technology. So that's really the reason.

And that created, essentially, a sense of urgency at NIST to start looking into AI in a much more comprehensive way. You know that we released a risk management framework following a mandate from, actually, [U.S.] Congress to develop such a document; such a resource. And related to the RMF, NIST is setting up a resource center to complement the framework with additional knowledge sources; one of which is the bias document that I mentioned earlier, and another now is the taxonomy of attacks and mitigations in adversarial machine learning.

D Dehghanpisheh 6:41

Right.

Apostol Vassilev 6:42

It is important when you're thinking about deploying AI technologies to know exactly what people can do to you and what mechanisms you can deploy to mitigate some of your risks so that you can then be able to get a better estimate on your risks. The RMF gives you a general approach for managing the risk, but doesn't give you mechanisms for quantifying the risk in your specific context. And knowing the attacks and the potential mitigations, including their limitations, gives you a better handle of actually quantifying the risk in the context of your enterprise.

Of course, you have to take into account the assets you have; the value of those assets. All of that is part of the same equation. But it's important to know how people can hit you and what you can do to defend yourself.

D Dehghanpisheh 7:30

Absolutely.

Charlie McCarthy 7:31

Apostol, where does one start then, when tackling this type of research, embarking on a white paper like this? As you were working with your co-author, Alina [Oprea], talk to us about your role, her role, in developing this white paper and maybe some of the research methodology employed by the two of you and your team to create something like this.

Apostol Vassilev 7:52

It's a great question.

It's such a vast field that when you look at it in the beginning, you feel kind of intimidated and lost. And so how do you start to do something like this? Well, the first thing you do is recruit competent people to the project. And Alina is a great example of that. Alina is a faculty appointee into my teams.

Often NIST reaches out to external faculty in order to augment our capacity to develop standards and guidance of specific kind. And so, Alina is a well known leading expert in this field. She's done a lot of work in privacy and adversarial machine learning, that sort of thing.

Myself, I’ve been studying the field for quite some time. And so we then discussed it with her. I facilitated her appointment to NIST, and once this was done, then we discussed how to approach the problem.

NIST had attempted, together with MITRE, to develop an earlier draft of that document back in 2019. Unfortunately, that document fell short of expectation in a way. It wasn't received well by the adversarial machine learning community because of some concerns with incompleteness and that sort of thinking of the coverage.

D Dehghanpisheh 9:12

So, talking about 2019 compared to now, I'm just curious, as you go around looking for input, if it missed the mark in 2019, this one seems to be getting a lot of positive reception. A lot of people have referenced it. We tended to really like it.

As you go about creating something like this, what are some of the biggest challenges you encountered that, say, you had in 2019 that you haven't had in 2022 as you kind of worked through that?

Apostol Vassilev 9:38

The biggest challenge, I would say, was the sheer size of the theoretical landscape we had to navigate. To give you a sense, if you look at the number of adversarial papers published on Archive for the entire decade from 2010 to 2020, you had about 3500 papers published with exponentially-looking increase.

But if you just took the last two years, ‘21 and ‘22; for these two years, there were over 5000 papers alone. So for two years, you had more [papers published] than the entire preceding decade. So from exponentially-looking, the graph starts to look vertical.

And that's a problem that we encountered with Alina, but thanks to her experience and, of course, my involvement in it, we were able to navigate this huge field in order to distill those ideas that held against the test of time - those ideas that have really made a difference in the field, both in terms of adversarial attacks as well as mitigations that are more robust.

And that was the trick. It's not enough to just mention everything. You have to summarize it in a way that sorts out for the reader what's important, what's effective, versus all these small enhancements that, in the end of the day, wash over time.

D Dehghanpisheh 11:07

Right.

And so earlier you had mentioned MITRE and some of the work you've done. We've been very fortunate to have Dr. Christina Liaghati on [The MLSecOps Podcast].

And I'm wondering, can you explain how the NIST taxonomies and the NIST frameworks overlay or participate alongside the MITRE ATLAS framework, which she's talked about on this podcast?

How do you two, as an organization, collaborate and try to guide the entire field towards a more robust and more secure AI world?

Apostol Vassilev 11:42

That’s a good question. I know Christina. We met with her, I would say, a couple of months ago or so. And strangely enough, I have a meeting with her next week to talk about this very issue.

D Dehghanpisheh 11:56

Make sure you tell her we said hello. She's a very popular guest.

Apostol Vassilev 12:00

Absolutely. Be pleased to do so.

We'll chat with her and find out ways for us to collaborate. But MITRE has been a great partner, to NIST in particular, and to the federal government in general. They are helping us develop standards. They're helping us implement some of our vision in terms of developing systems around that and things like that.

So, I don't want to preempt my conversation with Christina next week. Let's just leave it at that for now.

D Dehghanpisheh 12:30

Fair enough. Fair enough. And the mystery will continue.

Charlie McCarthy 12:34

All right, well, this feels like a good time to pivot a little bit then, and talk more about adversarial machine learning attacks, which are outlined in this framework.

Apostol, can you talk to us a bit about how Adversarial ML attacks have evolved, specifically in the last year? We're hearing a lot about prompt injection attacks. Those seem to be very prevalent, maybe because they're the most reproducible.

But beyond those prompt injections attacks, what other attacks do you think are becoming more prevalent and relevant, and we might see more of, especially with the rise of large language models [LLMs] and the hype surrounding that?

Apostol Vassilev 13:11

Yeah, that's a challenge that makes some of us lose their sleep over time.

Yeah, prompt injection has emerged as a major type of attacks that have been deployed recently. And they became even more popular since last November when ChatGPT entered the scene. And as you know, chatbots are a little more than just large language models.

They have components such as the one that's intended to detect the context of the conversation the user wants to engage in. They have also a policy engine that determines whether the conversation the user is trying to engage in is within the limits or out of bounds. And so the way these things work though, they don't have complete cognitive intelligence much like what we, the people, have.

They don't have their own morals and values and things like this. And so they have mechanisms that are more or less capable of automating a cognitive task. And in that sense they are limited. They are fragile. And people have found ways to actually deceive the context detection system such that the policy guard can get fooled and allow conversations that are inappropriate to output some toxic content or speak about issues that the initial policy wouldn't allow it.

And the easiest way to do that would be to engage in role playing. Ask the system to play a specific role instead of asking it directly, “tell me something,” which the policy engine would reject. You tell them hey, imagine you are such-and-such person and you have these qualities, respond to me in a way that this person would respond and that's the typical way these things break.

Another thing would be to tokenize your input instead of inputting the text that you would, then you break it into pieces they call tokens. And then you submit it. It turns out that the context detection mechanism is not robust to handle input like that and then just let it go. And then the language model behind, past the policy check is capable of reinterpreting these tokens. And that's how you get them to say what you want them to say. So you kind of bypass the controls in that sense.

So these are just the initial steps in which the technology has been engaged with and shown quite fragility in that sense. But this is not the end of the road. More interesting stuff is coming, especially since we all know that chatbots for now have been deployed only to almost like a public demonstration mode. If you can engage with a chat bot, well, the only person affected is you. Because the toxic text is gonna be

D Dehghanpisheh 16:04

It's one on one.

Apostol Vassilev 16:05

Yeah, it's a one-on-one experience.

Certainly it can be dangerous, as some examples from Belgium have shown, that people who engage one-on-one with them can be led to even suicide by inappropriate content exposed. But that's not the point I'm making.

The point I'm making is that chatbots are now being connected to action modules that will take the instruction from the user and translate it into actions that would operate on your, say, inbox, in your email or on your corporate network to create specific requests for service requests or what have you.

Even in the simplest case, you can say, order pizza or something like that. You can get an idea. I'm just giving you a very rudimentary set of potential actions here. But that's where the next generation of attacks will come.

And in addition to that, we've all known from cybersecurity how dangerous phishing and spear phishing attacks are, right? Because they require creating specific crafted email to specific users. Now, LLMs allow you to craft very, very nicely worded, authentically sounding email. That will put us on a totally new wave of well known cybersecurity attacks. So all of these trends with–my prediction is that we're going to see evolve rapidly over the next few years.

D Dehghanpisheh 17:34

So, speaking of trends and rapid evolution, there's been a lot of just tremendous acceleration in the infrastructure that powers all elements of artificial intelligence systems. Cloud computing, TPUs even more advanced TPUs, specialized ASICs, just general purpose CPUs being, I would say custom built, if you will, for lack of a better term, for AI workloads.

What about infrastructure?

What has changed in the infrastructure powering these systems that have made adversarial all more relevant now than, say, a year ago?

And what does that look like with the continuation and rapid acceleration of that infrastructure's compute capabilities and data capabilities? How do you think about that in terms of creating new threats against these ML systems?

Apostol Vassilev 18:24

Yeah, infrastructure plays a big role.

And the infrastructure has even one other dimension that people often overlook, and that is the scale of it. If you think about what is needed to train a language model of the size of what GPT-3 or -4 are, you need to scrape data off of the entire Internet to get there. Okay?

And guess what people are doing today? Well, they're buying Internet domains. And they are basically hosting poisoned content such that if you scrape the Internet and ingest this data, they're going to poison your model. Meaning that later they can exp–

D Dehghanpisheh 19:05

You’ve got backdoor entries that you aren't even able to see.

Apostol Vassilev 19:08

Exactly.

So that's one aspect of the infrastructure that we're looking at that didn't exist before. Before you had your enterprise, you had your corporate boundary and that was your world. That's what you defended.

Now you have to defend the Internet, which is an impossible task.

And of course, the evolution of computing infrastructure is also important. And we've transitioned from regular CPUs to GPUs, and so on. That's great. But as usual, security has been an afterthought in that arena.

As an example, I'll give you this case where researchers from Hong Kong essentially hijacked a Tesla, the computing stack of a Tesla, in order to develop a pretty sophisticated attack on the perception system of the car that actually caused it to veer off in the opposite lane of traffic, with potentially devastating consequences for the vehicle and the occupant inside. And that was all because the whole computing stack, the GPU, the Linux instance, they're wide open.

D Dehghanpisheh 20:18

So, let me ask about that. There's this big debate. The air quotes, “leaked,” Google memo, “We Have No Moat, and Neither Does OpenAI,” and the commentary about the rate of acceleration and adoption of OSS versus, say, closed and proprietary systems.

One of the things that you're highlighting here is that most OSS/open source software doesn't seem to have a security first mindset for very good reasons. Right? Like, I get why it doesn't have that. It frees developers and powers innovation.

But if that ends up being more true than not, that the majority of ML systems continue to have a lot of open source assets even in the underpinnings of the commercial offerings, how does NIST think about the security gaps of those things being inherited?

Because OSS is notorious for not having security built at the heart. And if ML is built on the back of OSS, what does that mean for security and adversarial ML attacks in general?

Apostol Vassilev 21:22

It's a mind boggling question. Frankly, people say that OSS is really secure because you have the thousand eye guarantee there. 1000 people have seen it and if something wrong is there, they would have noticed it. But that's a false assumption.

We all remember the heartbleed problem with OpenSSL that rattled the Internet about a decade ago, and that was subject to the same thousand eye assurance guarantee and yet nobody was able to spot it. And I think open source systems present an interesting dilemma, especially in the context of machine learning.

On one hand, we don't want to allow a few companies to govern the space and dictate to everybody what machine learning AI should be. Those are the companies with huge resources and capabilities. Open source is an alternative to that and presents really refreshing opportunities.

At the same time, because of the fact that this is a dual-use technology, making it available to everybody is a risk of itself because a powerful AI system will service the instruction of a benevolent person and a malevolent person just the same.

D Dehghanpisheh 22:41

Very true.

Apostol Vassilev 22:42

So, that's really the main risk. Much more so in my opinion, than the underlying security risk. How do you prevent unintended use of open source systems?

Charlie McCarthy 22:55

Right.

So, as we're thinking through some of these risks, existing risks, and also trying to predict what other threats might be coming down the line as AI and machine learning evolves, some of the mitigations that are discussed in the white paper–

Apostol, what are the next steps for organizations after reviewing a resource like this, other resources discussing mitigations, how can organizations, practitioners start to implement some of the recommendations to enhance their security in their ML systems?

Apostol Vassilev 23:26

Right.

So there's two important aspects we wanted to convey with our paper in discussing mitigations. One is that you have to stick to mitigations that have lasting value. That can withstand the test of time for a certain period of time. And obviously, adversarial training is the number one technique people can use.

It's an empirical technique, and it involves taking adversarial examples into your training set and retraining the model with those to ensure that they recognize this adversarial example. The problem with it is that it is costly and it's reactive.

You cannot preempt an attack. You have to experience an attack and then mitigate it by taking the adversarial input. So it's a bit of a reactive strategy. It's not an analytic or powerful defense that you could deploy.

And this is exactly the problem with all these mitigations. They are just mitigations. They are not full defenses. They are limited in a number of ways. And that's what we wanted to convey to people that unlike many notions accepted in classic cybersecurity, where you have a problem, you fix it, and you're done with it, you know that this problem goes away, a lot of the mitigations here that you can apply don't give you full guarantees. And people need to understand that.

And just because they mitigated their model in a specific kind of way, that doesn't mean that they're not going to experience a similar type of attack, slightly modified, but the same type of attack coming at them a little later. That's the difference between absolute guarantees and conditional guarantees that all of these mitigations that exist in machine learning today offer you. They're conditional on the type of input that you have.

As long as the input doesn't deviate too much, then it will be okay. But if it deviates above a certain threshold, you're out of luck. And that's really the mess that we want to convey to people. Don't think that if you have applied the mitigation, you're home free at that point, and you can forget about it and move on to something else.

So these are the two things: Focus on tried and true techniques that seem to be effective even though they're imperfect. And never forget that the techniques that you implemented are limited. They're not perfect. You cannot rest assured that you've covered every base.

D Dehghanpisheh 26:00

So, Apostol, when we were in San Francisco, we talked a lot about community. That it's going to take a community to get together and figure out how to collectively solve these problems.

My question here is, how do we rally that community? What's needed? And what can we all do? And what should we all be doing together?

Apostol Vassilev 26:19

Yeah, it's a big problem and requires a community to work at it in order for us to get in a better state than what we have now.

It's a matter of outreach, I would say. If you look at the report in the draft that I published, I try to reach out to not just NIST and a few academic researchers, but reach out to other federal agencies who have vested interest in the technology to solicit their review and feedback.

I reached out to institutions from Europe and Australia as well to breach the geographical divide and get, really, input from everywhere because we all – humanity as a whole – is going to be subject to it.

Everybody needs to help and everyone's perspective on it is limited. So the more of us combine, the better the result will be. And in fact, some of the input we received from Europe and from Australia was really good and helped to improve the quality of our first draft. And so we intend to continue in this manner.

We would like to create, as I said, more awareness to invite the community, the industry, the academia from all kinds of spheres to help us develop guidance that addresses their needs. Because the different industries are going to use slightly different instances of the technology and their attack surface is going to be different depending on the type of industry you have.

And right now I'm going through a period of engaging with various companies who have shown interest in providing feedback to our taxonomy to talk with them, understand their needs, ask them to provide their thoughts on it, and incorporate this feedback into the next version.

We intend to close the comment period by the end of September of this year and then publish the 2023 version of the taxonomy at the end of the year, and then open up a new draft at the beginning of next year. Because I believe that we're going to be here, employed in this field for quite some time.

D Dehghanpisheh 28:33

Yeah, I imagine that the request is, hey, get involved and give us commentary on the next turn of the crank.

Apostol Vassilev 28:40

Yeah, it is.

And NIST is a convener. That's our role most of the time. We convene people with different kinds of experience and representation in order to solve problems of common interest. And we intend to host conferences, workshops in the future to highlight some of the problems and again, create awareness, create initiative and solicit feedback.

D Dehghanpisheh 29:05

I'm confident that the MLSecOps Community would love to host that conference. So we will be in touch and be back on that.

Apostol Vassilev 29:11

Absolutely. We'll engage with you.

Charlie McCarthy 29:13

That would be fantastic.

And we are 100% aligned with the sentiment expressed around community. And as someone succinctly put it in a previous episode, none of us can do this alone. So we need people from all paths, different industries, to provide perspectives so that we know what we're actually dealing with in terms of these adversarial threats.

Apostol, it's always a pleasure speaking with you. Thank you so much for joining us today. We will link to the White Paper, Adversarial Machine Learning a Taxonomy and Terminology of Attacks and Mitigations in our transcript and hope to talk to you again very soon. Thanks for listening, everybody.

Apostol Vassilev 29:50

Thank you. Thank you, Charlie, and it was my pleasure.

D Dehghanpisheh 29:53

Thank you, everybody. Thank you, Apostol.

[Closing] 30:01

Thanks for listening to The MLSecOps Podcast brought to you by Protect AI.

Be sure to subscribe to get the latest episodes and visit MLSecOps.com to join the conversation, ask questions, or suggest future topics. We’re excited to bring you more in-depth MLSecOps discussions. Until next time, thanks for joining!

Additional tools and resources to check out:

Protect AI Radar

Protect AI’s ML Security-Focused Open Source Tools

LLM Guard - The Security Toolkit for LLM Interactions

Huntr - The World's First AI/Machine Learning Bug Bounty Platform

Thanks for listening! Find more episodes and transcripts at https://mlsecops.com/podcast.

The MLSecOps Podcast

The Evolved Adversarial Machine Learning Landscape

Additional tools and resources to check out:

Get Updates

Featured Articles

Categories