Evaluating Real-World Adversarial ML Attack Risks and Effective Management: Robustness vs. Non-ML Mitigations
Nov 30, 2023 • 31 min read
In this episode, co-hosts Badar Ahmed and Daryan Dehghanpisheh are joined by Drew Farris (Principal, Booz Allen) and Edward Raff (Chief Scientist, Booz Allen) to discuss themes from their paper, "You Don't Need Robust Machine Learning to Manage Adversarial Attack Risks," co-authored with Michael Benaroch.
Drew Farris 00:21
My name is Drew Farris. I'm a Director of Analytics and Artificial Intelligence at Booz Allen Hamilton. I'm a wearer of many hats. I lead some of our internal research initiatives. I also am directly engaged in doing work for clients. My background is machine learning.
I got my start in the industry in the late nineties after I finished my master's degree working for a small research startup doing natural language processing and information retrieval work. And that's been the theme for my entire career. I'm an engineer at heart. I love implementing algorithms, but I also love implementing systems that will solve real problems. So I tend to take a system oriented view towards machine learning beyond the algorithm itself.
Edward Raff 01:00
I’m Edward Raff. I'm a Chief Scientist here at the firm (BAH). With Drew, I help lead our machine learning research at the firm, and also a bit more on the “mathier” side developing algorithms and new techniques to enable new capabilities. And then I reach back to Drew and say, “Hey, we made something. Can you help me actually use it somewhere?”
I discovered machine learning by accident. I'm a computer scientist through and through. I was doing a semester abroad in England to satisfy my foreign language requirement and accidentally took two classes on machine learning information retrieval and loved them and sort of fell into it.
D Dehghanpisheh 01:36
So you took some classes on a language and you fell into natural language processing. Is that kind of how that loop went or something similar to that?
Edward Raff 01:43
It was a weirdness. The course number started from zero instead of one. I thought it was a computer science joke. Turns out it wasn't. I had registered for the wrong classes entirely and they were like, no, you gotta sign up. These are the only three classes you can take that work for your schedule and transfer credit. And they were perfect. Changed my life.
D Dehghanpisheh 02:02
In today's episode, we're going to be covering a lot around AI security, ML security. The difference between the two. We're going to be referencing the paper, “You Don’t Need Robust Machine Learning to Manage Adversarial Attack Risks.” It's published on arXiv. We've got a link to it. It’s a really great paper and that's kind of how we found you two, and all the work that you're doing.
Let me start by asking: How real is the concern from customers and clients that Booz Allen is serving in regards to the security of AI as a class of applications and then the security of machine learning, which is really the systems that create the models which turn a regular application into an AI application because it contains a model.
Can you give some context there about the kind of spectrums that you're seeing with your clients?
Edward Raff 02:46
Yeah, I think it's really interesting. We're a federal contractor. Most of our business is working for the United States federal government. We do things for other countries and for private industry as well, but primarily the federal government. And a lot of adversarial machine learning is fundamentally very interesting from just a pure science machine learning perspective, understanding what on earth it is we're even doing.
But not everything that people talk about makes sense from a practical perspective on the adversarial machine learning side. But from the government side, a lot of it really does because you're imagining like, okay, who has the time, willpower and budget to attack your next song recommendation? But like, who cares? Like, do I really want to make you listen to my daughter's Cocomelon soundtrack?
D Dehghanpisheh 03:36
But the Drake bots, man, the Drake bots are all about gearing that up.
Drew Farris 03:42
I mean, I always think about search engine optimization as the first industry built around gaming machine learning algorithms. But anyway, go ahead, Edward.
Edward Raff 03:50
Yeah, and there's definitely places where it really comes into play, both in industry, but it's in play a lot more in the government space because there are literal other nation states with budgets measured in GDPs who have the time and interest to do these things.
So there's a lot of concern there. There's also concern just internally that if we're going to be using U.S. citizen data, that we're avoiding that becoming a new way for people to steal information or it being released to a party that it's not supposed to. So there's a lot in the government space where this becomes really potent and relevant.
But sometimes we see people get a little carried away, hence the paper.
D Dehghanpisheh 04:34
Yeah, in those spaces where obviously, you know, matters of national security, national defense, offensive capabilities, where you see these large state actors having a reason to go after this, are there particular areas that you think – and your paper talks a little bit about this – Are there particular areas of an application or machine learning system that are more vulnerable than others to certain particular types of attacks or classes of attacks?
Edward Raff 05:02
Yeah, we talked about this a bit in the paper and I think evasion attacks where you have some model that exists and now someone's trying to get your model to make errant predictions. One of the more realistic threats and concerns in any situation, it's the real world environment that makes the most sense. There's a model there. You can kind of do some reconnaissance, you can query it, you can see how it's working.
And just knowing that the model exists or that you think the model exists. Like, my credit card, I'm sure they've got a fraud model. I don't need to know particular details about it in order to guess some of the things they're probably doing. And then I could create some kind of attack to run against it. And that's what people do every day who don't even know what machine learning is and are just trying to commit fraud.
That to me makes a lot of sense. Poisoning attacks in general seem to make the least sense. It often requires a lot of knowledge, and this is where something we talked about in the paper was like, okay, what's the real world threat model? How is this attack going to play out in real life?
So, I have seen people who are concerned about, well, what if you used adversarial machine learning to alter the radiograph so that it looked like cancer and now you're giving someone chemotherapy who didn't need it as a way of trying to assassinate them? Like if they have that kind of power, like they could screw with this, like just change the medical record. Like, why are they going through all this effort?
D Dehghanpisheh 06:27
Yeah, we had talked a lot about that in the founding of the company who sponsors MLSecOps Podcast, which was, it takes a lot of money to launch a truly sophisticated adversarial attack in a large class of models, whereas it's a lot easier, like, let's say, you know, when you think about a penetration test at a financial company, that pen tester or that red team, they're going to classically come back with, ‘Here's the attack types that I did. Here's what I came back with, and here's the credit card numbers and the user information.’
So like, okay, that's cool. But what if I could get around your entire KYC rules, or anti-money laundering rules, or fraud detection rules? And now I don't just have one credit card. I have the entire corpus of financial classes of transactions that can kind of see that go.
And we've talked about it from a security perspective of systems, but your paper talks particularly about robustness and security distinctions, which we really loved when we read this.
You argue that robustness and security should be distinguished with adversarial attacks on ML models. Clarify that distinction for our audience.
Edward Raff 07:26
Yeah. So a fun example is, maybe I've got the world's most perfect hot dog classifier. Give it a picture. It will tell you with 100% accuracy if it is a hot dog or not, and there's just no way to absolutely attack it. It is perfectly robust.
D Dehghanpisheh 07:42
Nice Silicon Valley series reference. That's great.
Edward Raff 07:47
I got to finish the series. I keep starting and stopping, but so you've got this great classifier. It's perfectly robust. Doesn't mean it's secure! If you store it on an open AWS bucket and people can just download it like, all right, they've got the model now, it's gone, your competitive advantage is gone. They didn't need to run some inversion attack because you just left the door open and they took it.
Similarly, you might have a model that is very useful, right? What was it, I think it's Box, famous statistician. “All models are wrong, some are useful.” Maybe you've got a very useful model that's vulnerable to attacks as all of them currently are, but it’s only used internally. There's no outside exposure to it. You've got good controls on who can access it and for what reasons. And when someone does access it, it's recorded. It's probably pretty secure. You probably don't need to worry about an adversarial attack against it.
Now, obviously, it's not always one or the other like that example, but I think it illustrates our concern that these are two different issues that overlap a lot. You can play the balance between them to get to your goals more easily than just throwing all your money and resources and time into trying to make the one true robust model.
Drew Farris 09:02
You should probably actually spend more time securing your model than investing in making a robust model in the first place.
Badar Ahmed 09:10
What are some of the challenges with building robust models because you have this really interesting cost benefit analysis in your paper. You call it a stylized model, but we'd love to hear your thoughts on why some of the techniques making the models robust are problematic or really expensive?
Edward Raff 09:30
Yeah, maybe I'll repeat it a little bit for people in the audience who haven't read the paper yet. Shame on them.
D Dehghanpisheh 09:38
If it was a book, we'd give it away. But you know, it's an arXiv link so everybody has it for free.
Drew Farris 09:44
Go ahead, give it away.
Edward Raff 09:46
So we played economist a little bit and said, okay, let's try to convert everything to dollars. And so we've got some abstract cost in money for every missed prediction we make. It doesn't matter what the situation is. Maybe you have some model that's predicting some customer behavior and they're going to be displeased and that'll cause some increase in churn. Whatever. Whatever the error rate is, you can somehow convert that to dollars and say this is the cost of errors.
If you're going to build a robust model, there is going to be the cost for errors for a robust model, which might be different. And there's also going to be a cost for making the model robust itself. Like, you put in– if you're doing adversarial training, that's like 100 times more expensive, you're literally multiplying your GPU bill by a factor of 100 and your model's just going to have a higher error rate in general now.
People haven't really been able to use adversarial training to improve accuracy under normal operations. So you've got an increase in the sort of penalty you're paying for a robust model twice, but higher error rate baseline and the cost to make it robust in the first place.
Badar Ahmed 10:56
And the cost element, just to maybe blow that up a little bit, like there's a direct cost of like, hey, this much in GPU cost or compute cost. And then there's an overhead that comes with like, skills and time and folks like waiting around. That's a pretty high overhead too, right?
Edward Raff 11:15
Oh yeah. It's a very big overhead, especially as adversarial training is still kind of the state of the art for like the best you can possibly do and is still totally a black art of just like, we have no idea, but we just turn these knobs until it worked. We don't know why these knobs made it work.
Badar Ahmed 11:34
And there's like, for example, like a lot of different types of adversarial training techniques. So it's kind of hit and trail, right? Is that what you're hinting at?
Edward Raff 11:42
Yeah, It's just like I've had people on my team, myself or people like, yeah, we used adversarial training for this and that made it robust and we tried to do the same thing and it doesn't work at all.
And it's just lots of little tweaks on, like, what's the ratio training of normal versus perturbed, and how many iterations of attack did they use during the training? Did it change over the training? And there's just all these little nuance details that make big impacts.
Badar Ahmed 12:07
And at the end is still a game of whack-a-mole, right? Because like you could apply a different adversarial attack. So you can, let's say, do some training right now. And you know, let's say a few weeks later somebody comes up with a new adversarial attack and now you're exposed to that attack because let's say your defense doesn't work for that.
Edward Raff 12:26
Drew Farris 12:26
Yeah. And many of the places we deploy models, the data is always changing as well.
So it's really, really hard to distinguish a potential adversarial attack from out of bounds data from your original dataset. Right?
D Dehghanpisheh 12:38
And that's just in the model exploration and kind of the training component. Never mind that once you're at inference, how are you determining what's actually an acceptable outlier outside of that probabilistic range versus something that's like purposely bad. Told you it was bad.
Like, what's the reality that you're going to be able to even figure that shit out, right? I don't know.
Drew Farris 13:00
Right. Absolutely right. And it gets back to that human cost as well, because we find that the evaluations for these sorts of things tend not to be automated and tend to be very costly in terms of efforts. Right? So…
Edward Raff 13:14
Yep, and that was sort of what we did. We've got a very simple model of just like all these costs and then you say like, okay, like just apply algebra. What's the break even point on these costs? And the break even point is pretty high, like you have to have a very accurate and robust model for it to just sort of make sense in the stylized world.
And I think there's definitely situations where that's true, where adversaries who are highly motivated and the cost of them succeeding is way higher than just a normal missed prediction. Like if I mispredict what song you want to listen to next, okay, you click the one button you hit next.
Or like, for a bank, if they're going to approve a fraudulent loan, that's different than them accidentally approving a non fraudulent loan. Like, okay, maybe we shouldn't approve this loan, but you're good faith, maybe you default, maybe you pay it on time. If you default we go through collections and whatnot.
D Dehghanpisheh 14:13
Because there are a lot of loans approved in 2007 that had nothing to do with adversarial machine learning that should not have been approved.
Edward Raff 14:21
Yeah. But yeah, the adversarial case like the money's gone. It's been wired to some boat in the middle of the ocean. Like, good luck ever getting that back. So the cost can be very different. And that's when adversarial robustness makes a lot of sense to pursue.
D Dehghanpisheh 14:36
Is that to say that the value of having always correct inferences is where you'd be focusing? Like, what is that framework that you're guiding customers and clients through in terms of managing the tradeoff of anybody who wants to think about robustness versus actually just getting your house together in terms of security of that model, of that system?
Drew Farris 14:56
It's highly use case dependent, right? Because it all comes down to what is the cost of a failed prediction. And that's going to vary widely from use case to use case, as Edward’s mentioned.
Edward Raff 15:08
Yeah, I think that's an example we included in the paper. One of the remediations is just sort of change the environment. It's like, if you control where predictions get made, maybe it's really convenient for you to predict things at the store front as someone's walking in.
If the cost of an adversarial error there is really high, maybe you just move that prediction later on in the pipeline. Maybe that's a little more friction on the user side. Maybe it's a little less convenient, but you're potentially just completely demolishing the actual security risk. That's this tradeoff that you can really play that we don't think machine learning people are really considering as much as they should be today.
And part of the call to this paper is like, hey, there's this bigger picture here. If we can figure out what are some of the more realistic threat models we should be concerned about, and then let's do more research on those scenarios and those situations so that we can have an even better parade of optimal–
Badar Ahmed 16:05
And then the cost benefit analysis, it seems like it's very challenging except maybe for very, very specific use cases and we can get into some of those use cases.
But the drop in accuracy with pretty much most adversarial training techniques is very high. Many times machine learning teams spend months trying to get accuracy just by a few percentage. Two, three, four, five percent. And that's considered a big deal. And as soon as you layer in adversarial training and you get, let's say, a hit of 10 or 20% like that just seems like a no go for the vast majority of teams.
Just purely, let's say you can even afford the additional cost it takes to adversarially train but they result in a drop in accuracy for benign inference cases.
Edward Raff 16:52
Yeah I know it is a huge problem. There's some stuff that's getting better. Randomized smoothing is getting better. I think that has a lot of potential. Randomized smoothing, though, comes with its own penalty. You're now increasing latency because you're running this through the model multiple times to average out predictions.
Back to Drew’s point, like you’ve got to look at every sort of situation as its own unique little snowflake and figure out like, okay, what's going on here? What actually makes sense? Like who's going to attack this and why, and play out that whole sort of scenario. And then you can look at different parts and pieces and figure out like, this is where the weak links are.
Can I change my model here? Can I change where the model’s used? Can I just change how I run my process to alter the environment? And once you've altered the environment– it sounds dramatic, altering the environment like we’re terraforming earth in order to avoid mispredictions. But it's often just simple things like do a fingerprint scanner before also just doing something like facial recognition.
And yeah, fingerprint scanners can be tricked. But now there's two things you got to trick and they both require different techniques and one's a lot harder than the other. And it's just a very simple thing that can make it a lot easier.
D Dehghanpisheh 18:12
So some of what you're talking about are mitigations, right? I'm curious. You know, your paper proposes that there are mitigations in kind of what you're talking about here that do not need robust ML. And that's kind of unique when you compare it to the corpus of other ML literature, which is like, no, you totally have to. It's like, you know, I really appreciate and love the really more pragmatic view that you all have taken on this. And I guess, you know, you talk about mitigations and poisoning, in inversion, evasion.
Can you discuss some of the mitigations, or kind of like brief drive-by summary components, and then Badar can get into them with you on, you know, mitigations for poisoning, mitigations for immersion, mitigations for evasion?
Edward Raff 18:52
Poisoning, I think, is sort of one of the easiest ones to to handle, one of which is just like real cryptography, not Bitcoin stuff, but like, cryptographically sign your data. Like, I got it labeled at this point in time. Let me record who the labeler was, date, time, hash, sign it. And now like there's no way for anyone to ever alter this data point without either breaking cryptography, in which case you've got bigger problems like, you're screwed. This is no longer your top concern. Or like, they alter it and you know, and maybe there was a data transfer error or maybe there was a bit got flipped, like, that can happen.
But you know, something happened and you can just throw that data away or you can go try and figure out what went wrong. Or just like have an air gap server. And companies do this all the time. They'll take snapshots of their data. They'll send it to a company that stores it inside a mountain in the desert where it's cool and dry and that will just be an emergency backup. Like that's an existing business. And you can do that with your data.
Like, there's no way that that data was adversarially attacked after you've shipped it off to live in a mountain in the desert. Unless someone snuck into the mountain, found the stockpile that had your data alone like it'd be insane.
Badar Ahmed 20:22
Yeah, I want to get to the cryptographic signature of data. That's something that we've been kind of discussing as well with some of the folks at Protect AI. And I think it's a fantastic technique. Right? It's very prevalent in the software ecosystem and AppSec, signing software artifacts.
The only, I guess, challenge that folks run into, especially with large datasets, is sometimes the cost that comes with, you know, signing like really large data sets through cryptographic techniques. I wonder if you have any thoughts around again, like it's a layered approach, right? We're not like putting all of our eggs in that cryptographic signing basket.
Are there also techniques of, let's say, doing checksum through non-cryptographic techniques that are actually faster? Have you come across some of these challenges?
Edward Raff 21:10
Yeah, there's lots of things you can sort of pick and choose to trade off between speed and sort of confidence and knowing what goes wrong. And some of that's going to depend on what you're doing.
There's a really famous example of I think it's satellite TV. I don't know if it's still the case, but for a long time they used DES encryption well into the 2000’s, once this was considered completely broken. It’s like, why are you using DES encryption, and it’s like, because it's good enough to keep people from hijacking the stream for like an hour. And that's all we need because people want their live football. And if it's an hour late, they don't care. Crappy old DES encryption was good enough, but it was super fast and it was super cheap for them to implement. So there's a lot that can be balanced there.
Badar Ahmed 21:57
Yeah, that makes a lot of sense. Another thing that I read in the paper, which I really liked, something that again, we're working on as well, is you mentioned supply chain validation as a protection for data poisoning. Do you want to get a little bit more into what your thoughts are there?
Drew Farris 22:17
There was an example where one of the core datasets folks trained a bunch of reference models on was distributed as a list of URLs. Right? It might have been one of the early Imagenet datasets or something like that. Horribly weak, right? Because, you know, things change on the Internet all of the time. You could even speculate like a bad actor might find an expired domain that hosted a bunch of training data and just replace it wholesale. Right?
So, I think this comes down to the way we build the datasets in the first place. And it really does come back to that checksumming. If you can't distribute source data and you're delivering references to data, deliver something else about that data as well that’ll allow you to do some post-hoc verification in terms of the collection of that data as it goes into your training datasets, ensuring that you have control of that data from the acquisition space to the application space when it comes to training, right.
And that's a pretty, you know, filthy example of URLs like normal businesses wouldn't necessarily do that. But if you think about how you might extend this idea to data collection devices, right? Like things on the edge that are acquiring data that goes into your models, push that verification all the way up to the edge as far as you can. Right?
You talked about also the cost of cryptographically verifying every single data point, you’re right, like generating hashes for something that's arriving at 60 frames per second is probably a nonstarter, right? You extrapolate out to then hashing batches or something larger than the most atomic piece of your data set as well. So there's sort of a couple of considerations for mitigating those sorts of supply chain attacks.
Badar Ahmed 24:07
Awesome. Moving on to some of the other techniques, adversarial ML techniques or threats. So, if we touch on inversion. In your paper, you talk about differential privacy, which is something that's gaining a lot more momentum and it's much more practical than, you know, any of the adversarial defenses, to be very honest. Do you want to shed a little bit more light on that?
Edward Raff 24:27
Yeah. So for any listeners not aware, model inversion, since I don't think we've talked about it explicitly yet, is basically like stealing information from the model. You can reverse out training data or figure out whether this data point was or wasn't part of the training set. Maybe make your own copy of the model. That kind of information theft.
And differential privacy is a relatively young area of science. It was invented in 2006 by Cynthia Dwork. I'll let her assassinate me for probably saying her name wrong. It gives you this sort of very powerful mathematical guarantee that if you do things in a certain way and you can show this property called differential privacy, that there is nothing in the universe that can happen after that point to cause any more information leakage to occur.
Like, you have a budget that's called epsilon. And this is sort of an absolute guarantee of the maximum amount of information that can be extracted by any one sort of operation or step. So it's really useful for all the things we just discussed that you're limiting the information that can leave the model. This is something we've been doing some research and work in this area internally and for our customers over the past year or two now.
It's a super powerful technique. The hardest part of it, it's a very “mathy” area and there's a lot of people in the area doing research who are there to do math, not to solve problems. And so at times we have to look at things and be like, all right, this is a good idea. This doesn't actually work. And what do we need to do to turn this into a solution that actually works, so that we can actually deploy this?
But there was one we've got a NeurIPS paper coming up this year, actually, that I'm really proud of. L1 regularized linear models are like one of the unsung workhorses of getting the job done. Super fast, easy, automatic feature selection as part of training, works great for high dimensional problems. Still, I think personally the best tool for any high dimensional problems.
So there's a lot of papers on differentially private, L1 regularized, linear regression or logistic regression. None of them have ever trained on a dataset of more than 5000 features. That's not a lot, and they don't make sparse solutions. It's like, okay, if I'm going to train on a million features, I'm going to get a dense solution. That's not what I use an L1 penalty for. And so we figured out like, okay, we can actually figure out how to train on large datasets efficiently and get these properties. So that's one where there's a little more work involved sometimes to use it. But if you can use differential privacy, it's a super powerful proof that you're doing a good job protecting things.
Badar Ahmed 27:20
I want to move to the most popular adversarial attack: Evasion. For better or for worse, it's basically become like the poster child of adversarial ML. If you survey the paper in the landscape, most of them are addressing evasion.
What are your thoughts on evasion threats and defenses?
Edward Raff 27:40
It's the hardest one to actually be proactive about, in part because it's a fundamentally dynamic environment. Real world data will drift and will change and you will have some natural error rate. And there's this sort of distinction you're trying to make between good and bad errors. Maybe I should say benign versus nefarious errors, whereas like model inversion, like, yeah, the data outside changes, but your data you trained on is static, like the models trained and that’s static now. When you have your training data, like that's static, I'm training on this set of data.
So evasion is harder because of this sort of altered reality. And there it's sort of just do simple good practices that ideally you're doing already, like, be labeling data as it comes in. Like, have a team that’s always labeling data, getting data labeled and compare between your accuracy on your newly labeled data and your reference accuracy in training.
And if there's a big difference, okay, you've got a big red flag. If there's a tiny difference, you don't necessarily know that anyone's doing anything, but you'll at least discover like wholesale ‘onslaught,’ I guess, is the right analogy.
Badar Ahmed 28:58
So there's like popular techniques and, you know, conventional software like put in an API key or even if it like, let's say if it's a public endpoint, where it doesn't make sense to put an API key for kind of protecting that endpoint or having less folks hit it, there are still like, throttling, distributed throttling techniques are those pretty, you know, like a good effective way to thwart off most evasion attacks.
Drew Farris 29:27
This is a perfect example, right? Because so many evasion attacks depend on reconnaissance to identify where the decision boundaries of your models are, right? So using classic cybersecurity techniques to prevent those boundaries from being effectively probed is a good place to start, right? Like how many wrong passwords can you enter into a system before you get locked, right? It's usually about three or something like that, right?
So, using these same principles for machine learning systems, maybe not at that scale, really can help protect against evasion attacks.
Edward Raff 30:00
You get to three predictions against Drew’s models then you're out.
D Dehghanpisheh 30:04
Yeah well honestly, that's kind of like the thing when we talk about evasion or inversion, where you're querying, pulling the model or the endpoint a ton to try to find, it's like, man, just have the right settings on your WAF.
Like just think about basic security components, which leads me to this question of we've talked a lot about practicality of the attacks. I'm curious what your take is on all the practicality of these defenses, like AI firewalls or, you know, detection mechanisms at endpoints for adversarial components, where it feels like to me some basic architecture principles coupled with some zero trust approaches and some just basic three tiered web API best practices might actually shrink that surfaced almost zero.
How do you guys think about that?
Drew Farris 30:53
I think that very much lines up with the way we're thinking about it. What you're naming there. Good DevOps principles around observability, around access, control, and restriction. These are all things that need to be built into machine learning systems as well, right? Just because we're doing this new fancy thing doesn't mean you throw years and years of understanding of how to build secure applications out the window. I think that's what's really important.
It also really gets to the importance of having a diverse set of perspectives of the staff on the machine learning engineering teams as well, right? Like, it's great to have lots of wizards like Edward on your machine learning team, but you also need a set of other diverse perspectives when it comes to thinking about how you build these systems and secure them.
Edward Raff 31:44
I think it ties into the theme of like, you can change the environment to make your life easier. Yeah, maybe you're making a little more friction for the user or something like that, but it's a trade off that might be worthwhile.
Badar Ahmed 31:56
So I want to tie this back to the stylized model, the cost benefit analysis that you have in the paper. And I think that's the part that I really love. I would highly recommend listeners to check it out.
So, you have this categorization for example, that says different adversarial attacks that we're looking at like poisoning inversion or stealing and evasion. And then you have these categories like realistic, unrealistic, solvable and impractical. I think it's a fantastic way of categorization. It really helped me kind of crystallize my thoughts on adversarial ML.
I would love to get more, you know, like how did you guys come up with it? And since writing and publishing the paper like, has your thinking evolved?
Edward Raff 32:43
Yeah, I think some of that was framed with my propensity for using overly strong language to get a point across. So like, when we call them impractical, that's not to say that it will never happen, but for sort of the broad spectrum of most use cases, this probably doesn't make sense.
Like white box attacks in general are just impractical. Like if you have the level of access to do a white box attack, why? Why are you doing it? You have so much power.
D Dehghanpisheh 33:14
Yeah. Why spend the money to do that when you could take over the model registry and take it all? Like, what?
Edward Raff 33:19
Yeah, like, it just doesn't usually make sense. I think that's where in the future we might see something like there was a SolarWinds sort of supply chain attack. It wasn't a machine learning supply chain attack, but I could see that being something in the future where especially with these large language models that are very expensive to train, like, okay, yeah, there's going to be some smaller subset of models that people are all pulling from and using.
And yeah, I could see that being a valid threat model, that if someone could pull off a white box attack there, they might do it because it's actually a supply chain attack to get to everyone downstream.
D Dehghanpisheh 33:58
Which, that has a massive ROI to it, right? In that case, all of the engineering that you'd have to do, i.e. SolarWinds, you know, to get into that supply chain and then have that massive downstream blast radius. That's far more likely than trying to detect one endpoint if it's actually a valid input or not.
Badar Ahmed 34:17
Right, like somebody let's say poisons Llama 2 today, that's just going to have a catastrophic effect downstream.
Drew Farris 34:24
Yeah. Or you think about these code generation LLMs, right? Think about poisoning these code generation LLMs to generate code that has exploitable vulnerabilities.
D Dehghanpisheh 34:33
Drew Farris 34:35
Edward Raff 34:37
In most cases, white box attacks are just impractical. Because why would you, if you have that power. From a similar extension poisoning attacks in most cases are not realistic.
There's a few instances like Drew was mentioning, where people bought expired domains and changed them. But yeah, you can control your data labeling process. You can control who you get it labeled from. You can sign things, you can hash things. I just don't see in the real world how that gets pulled off if you're doing the good practices that you should be doing on the security front anyway. And all of these like, if you're doing a bad job on the security side, like, yeah, doors open, anything's possible.
D Dehghanpisheh 35:21
Your AI model’s the least of your concerns.
Edward Raff 35:25
Yeah. Assuming you've closed the door, you changed the lock after you bought it. Poisoning attacks don't really seem to be super plausible. And even in a lot of academic papers, when people evaluate it, they just don't have a high success rate. And so that sort of puts us really in like black and gray box threat models for inversion and invasion.
And evasion, generally realistic like we've talked about already. You can guess that the credit card company has a fraud model and you can guess that things are going to go into it. Congrats. You've pulled off a black box attack.
D Dehghanpisheh 35:55
Edward Raff 35:56
Exactly, right? Like, it's just the level of effort there is much lower to do something that will work with some success rate.
And inversion I think is sort of the – Goldilocks isn't quite the right word, because that has more positive connotations than what I'm going for – But inversions sort of this middle ground of, the attacks, there's reasons to do them. They're not super easy to pull off. There's tools to mitigate them. But it's not like evasion where it's kind of hopeless, like you're going to have some default amount of evasion that happens naturally because you don't build perfect models. It's not like poisoning where this doesn't really make sense and doesn't even super work well academically.
But we've got tools, we've got differential privacy, we've got strong theory that gives us a solid foundation to build solutions and so we can solve the problems that come up. There might still be a price, but there's a path forward. Where evasion, I feel like– was it ‘Bayes optimal error rate’ is like an expression that has left the vernacular. But Bayesian optimal error rate would be the DP counterpart to evasion attacks. If you knew what your Bayesian optimal error rate was, then you could build really great solutions and say, ‘I’m within the Bayesian optimal error rate.’
But we don't even know how to quantify what the Bayesian optimal error rate is for any set of data. So, life is hard.
D Dehghanpisheh 37:24
So along those lines of ‘this life is hard,’ and I actually think the way you teed this up at the end here is really useful, which is that an adversarial attack, to really get it done, it's a pretty, on one hand, complicated equation of input to output, and on the other, mission.
You have to have a reason to do what you're going to do specific to choose that weapon versus all the other approaches that you could take for either industrial espionage, you know, just other nefarious acts, whatever they may be.
Against that backdrop, what would be the call to action, Drew, that you would give practitioners today to think about in regards to the security of their artificial intelligence applications and their machine learning systems?
Drew Farris 38:05
Yeah, I think it comes back to having all of the right people in the room, and have that diverse conversation of voices. So that you just don't have the ML engineers off to the side in one part of the organization. But there's an ongoing conversation about security that goes into building the systems that these models inhabit, right?
So that, and the deeper understanding of use case specific mitigations, right? It’s understanding what's really, really important for the use case that you're trying to solve. One of my favorite real world examples is like, credit card companies have decided they'll eat a certain amount of fraud because they want to reduce the amount of friction for their customers in getting new accounts.
So it's a series of trade offs that have to be made and considered. Ultimately, it does come down to that cost that we've sort of highlighted in the paper. Right? It's come up with a cost model to think about how you approach these things.
D Dehghanpisheh 38:57
Maybe that's why sellers call it leakage and not theft. Cost of doing business.
Edward, to you.
Edward Raff 39:04
I think Drew stole the biggest point that I would hammer home.
Drew Farris 39:11
D Dehghanpisheh 39:13
It's okay. He's going to make it even when we all get together for brisket. It’ll be great.
Drew Farris 39:16
Yeah, there we go.
Edward Raff 39:18
You're playing a dangerous game with me, getting me off topic. As a Jew from Texas, brisket is very culturally important to me.
D Dehghanpisheh 39:26
Edward Raff 39:29
But trying to bring it back home and land the plane. Everything Drew said about having the right team and more than just playing fun with all the coolest machine learning things. Having this perspective of what am I actually trying to accomplish.
It's very easy to get lost in the weeds of, we set this as like the model, like we're going to build an A versus B classifier and okay, let's go build the model, and let's make it as accurate as possible and let's make it robust. But why are you building the model? What is the fundamental purpose that you're building this model for? Because it's often not to be an A or B classifier.
Like, fraud. That's a normal direct one that you like– Yeah, you want to stop fraud. But to Drew's example, why? Well, we want to maximize profit. And they came up with a solution of like, well, actually we'll have a little more fraud because it'll maximize profit, because we'll get more customers.
So it wasn't actually fraud. That wasn't the actual goal. It was this profit motivation. So for whatever you're tackling, like what's the actual goal? And maybe there's a different model you should be building that will help you get that done more effectively, more efficiently.
D Dehghanpisheh 40:46
Well, I want to thank Drew, Edward, and my co-host, Badar. We really enjoyed this conversation.
Go download the paper, find out more about AI/ML security over at ProtectAI.com and Booz Allen's website. And we really thank both of these distinguished gentlemen and guests for coming on today. And Badar, thank you as well.
Till next time.
Additional tools and resources to check out:
Thanks for listening! Find more episodes and transcripts at https://mlsecops.com/podcast.