ETH Zürich's Assistant Professor of Computer Science, Dr. Florian Tramèr, joins us to talk about the practicality of data poisoning attacks, the intersection of Adversarial ML and MLSecOps, and themes from his co-authored preprint, "Poisoning Web-Scale Training Datasets is Practical."
YouTube:
Episode Summary:
In this episode, we interview Florian Tramèr, PhD about the practicality of data poisoning attacks in machine learning. Dr. Tramèr explains that the traditional security and privacy issues in software and hardware have largely been addressed, but machine learning has presented new ways in which security vulnerabilities can arise. He discusses the different types of attacks, including poisoning attacks and evasion attacks, and provides examples of how they work. Dr. Tramèr also highlights the vulnerabilities that come with open sourcing data sets and the need for better methods to curate and inspect training data. He concludes by discussing the lack of tooling available and the need for principled ways to establish trust in machine learning models, data, and resources.
Introduction 0:08
Welcome to the MLSecOps Podcast presented by Protect AI. Your hosts, D Dehghanpisheh, President and Co-Founder of Protect AI, and Charlie McCarthy, MLSecOps Community Leader, explore the world of machine learning security operations, aka MLSecOps. From preventing attacks to navigating new AI regulations, we'll dive into the latest developments, strategies, and best practices with industry leaders and AI experts. This is MLSecOps.
D 0:38
Hey, everybody. Thanks for joining with us today on the ML Secops podcast is Florian Tramèr, who if you are in the most advanced fields of adversarial machine learning, I'm sure you know this name. So, Florian, welcome. Charlie?
Charlie 0:54
Hey, everyone. Thanks for listening. Florian, thank you so much for being here. To get us started, will you give us a little bit about your background and your role in the security of machine learning systems? You have a really deep career in this space. We'd love to learn a little bit more about that.
Florian 1:12
Yeah, sure. So I started working in this space about, I'd say, six years ago. This was just at the start of my PhD, where deep learning was sort of becoming a bit of a hot topic. And so some people were starting to look a bit more in detail at the security and privacy of these systems. And so it was a bit of an early stage time where we started looking at ways in which we can make machine learning models misbehave in security sensitive applications, and then over time also started looking at all kinds of different ways in which machine learning models could be attacked to sort of reveal training data, to steal machine learning models themselves, to make the models sort of give out wrong decisions and so on. And after defending my thesis on this topic a bit more than a year ago, I spent one year at Google Brain continuing doing research in the space with their privacy and security team. And then six months ago, I joined ETH Zürich as an assistant professor, where I'm building up a research group that's going to continue sort of trying to study the security and privacy of the forefront of machine learning models, especially at the moment, things like language models that everyone's very excited about. And we're sort of trying to figure out how worried we should be about these systems becoming as commonplace as they are.
D 3:01
Hey, Florian, you mentioned a couple of things in there that I found really interesting, and this podcast is, surprise, about MLSecOps. And one of the things that we think of in MLSecOps are those pillars of privacy and security and how that unfolds in MLSecOps. What's your take on that? And how do you think about that in terms of privacy and security in ML code versus, say, privacy and security in traditional code? And what's that intersection of DevSecOps for traditional code and privacy and security and, say, privacy and security in MLSecOps? How do you think about that?
Florian 3:43
Yeah, I think it's definitely a very new area for security and for privacy, and that I think in many traditional applications, be it in software or in hardware, of course we know of many security and privacy issues that can arise. But usually, over the years at least, we've kind of built up the expertise for what these failure modes could be and how we can address them when we actually find them. This isn't to say that we sort of solved security and privacy in a traditional sense at all. I mean, bugs and vulnerabilities creep up all the time. But I would say that one thing that to me really distinguishes these two spaces is that in most situations, when you find a vulnerability, when someone discloses a vulnerability to you today in something that is not machine learning, you will probably know how to fix it. Unless it's like someone, I don't know, broke your cryptosystem and you need to find some academics to actually go and build a new one. But usually if there's a buffer overflow somewhere in your code that's leaking data, well, you know how to go and fix that. With machine learning, we're also starting to find all these different ways in which machine learning can sort of fail, in which it can lead to security vulnerabilities, to privacy leaks. And then when someone comes to you and says, oh, you have this issue, usually it's kind of, well, bummer, what do I do now? Especially on the security side, actually, for many of the sort of core vulnerabilities that we know of or that people have discovered in machine learning systems, we don't yet know of a good way of mitigating these types of vulnerabilities in a sort of general and principled way. For privacy, we know of some ways…Yeah, sorry?
D 5:45
I was going to say the security is almost more infantile at this moment than even the attack methods or breaches that would occur, right? If there are early days in the security of data and privacy, and how you prevent it, then how you fix it is even earlier in terms of
Florian 6:06
Right, right, although I would say that even the attack side, I think that we're not yet entirely - I wouldn't say we know exactly what we're doing either. And that, I think at least on the academic side over the past six, seven years, there's been a lot of work that's kind of just shown that if you try hard enough and if you have enough access to a machine learning system, you can sort of get it to do whatever you want. But then when you're actually faced with a real system that you have to interact with over a network, that you maybe don't know exactly how it works, we also don't know really of a good principled way yet of attacking these things. So as an example of this that probably many people listening to this will have seen at some point or followed is if you look at these chat applications like ChatGPT or the Bing chatbot that Microsoft released recently, there are attacks against these systems where people find weird ways of interacting with these chatbots to make them sort of go haywire and start insulting users or things like this. But if you look at the way that these attacks are currently done, it's a complete sort of ad hoc process of trial and error, of just playing around with these machine learning models, interacting with them until you find something that breaks. And I would say even there on the attack side, we're very early days and we don't necessarily have a good toolkit yet for how to even find these kinds of vulnerabilities to begin with, unless we have sort of complete access to the system that we're trying to attack. Right.
Charlie 8:00
Right. We're using, Florian, some terminology when we're talking about attacks and vulnerabilities or talking about breaking things, all of these things that fall under the umbrella of adversarial ML - for folks who might not be as technically inclined, how would you describe adversarial ML briefly to those people?
Florian 8:19
I would say generally it's the process of trying to probe machine learning models with an adversarial mindset. So sort of trying to elicit some behavior of a machine learning model that is just against the specification of that model, where you get a system that just behaves in a way that was not intended by its developers. And usually this is done by sort of interacting with the system in some adversarial way, so in a way that deviates from normal behavior of a user that the designers would have expected.
D 8:59
So for a technical audience, I guess then, how do you think about the categories within adversarial ML? And particularly from an ML practitioner point of view, maybe you can talk a little bit about that.
Florian 9:14
Yeah, so I think in this space there's sort of a number of different vulnerabilities that people have focused on in the past few years. I would say generally these get sort of subdivided into four categories. The first being or I mean, in no particular order, let's say one being at training time when a model is being trained, how could you influence the training of this model to make it sort of learn the wrong thing? [The] general class of attacks here are called poisoning attacks, where you try to tamper with the model's training data or maybe with the training algorithm, depending on what kind of access you have to create a model that sort of on the surface looks like it's behaving correctly, but then in some specific situations would behave only in a way that an adversary might want to. So these are poisoning attacks. The counterpart to this is an evasion attack. So this is once a model has been trained and been deployed, where an adversary would try to interact with this model, sort of feed it input data that would somehow make the model just give out incorrect answers. So some of these attacks on chatbots where people get these models to just completely behave in a way that's different than the designers of the system would have wanted; this is what we'd call an evasion attack. So these are kind of attacks on the integrity of the system, sort of making the system behave in a way that's different than we would have wanted. And then the two other categories that people focus on deal with privacy, on the one hand, the privacy of the model itself. So here is a category of vulnerabilities that we call data, sorry, model stealing or model extraction attacks where the goal here of an adversary would be to interact with a machine learning system that belongs to another company and find a way to locally reconstruct a similar model and then just use it for their own purposes and maybe steal that company's copyright or just expertise in this way. And this is one of the areas that my work has sort of helped set up about six years ago, this sort of area of stealing machine learning models. And then finally there's sort of similar privacy considerations for training data. So here in this category you would have everything that deals with sort of data inference attacks where the vulnerability would be that someone who interacts with a trained machine learning model would somehow learn information about the individuals whose data was used to train this model in the first place. And this is of course a big, big risk as soon as machine learning models are being used in sensitive areas like in medicine, where the data that is being used to train is, yeah
Charlie 12:46
That's all actually a fantastic transition to another piece of what we are really excited to talk to you about today, which was your recent preprint about data poisoning at web scale. And part of that claimed that attacks on fundamental data sets used in various models as practical from an attack standpoint. So if we pivot a little bit to talk about that, how are you defining practical here?
Florian 13:14
Yeah, sort of as opposed to everything else that was impractical. But of course, that's a bit of a jest. Essentially so, as I mentioned before, poisoning attacks is kind of this idea that if you can tamper with the data that is used to train a machine learning model you can somehow get this model to have behaviors that were not intended by the model designers and that could be sort of controlled by an adversary. A canonical example of such an attack is something we call a backdoor attack where, as an attacker, you would place a specific trigger somewhere in the training data. So you could think of this as, say, a model that's trained to classify images. And as an attacker, I'm just going to place a little red square at the top of some images in the data set and always sort of tell the model that when you see this red square, you should think that this is a picture of a cat, regardless of what the picture actually is. And the model is then just going to learn this kind of incorrect behavior. And then once the model is deployed and sort of used in the world, anywhere in the world, you can just take any picture you want and you just have to add a little red square on it. And then the model will get confused and think, oh, this should be a cat, even though it might be a picture of something completely different. Now, of course, making models believe that they're looking at cats is not particularly interesting, but if you're dealing with a model that has to detect not safe for work images or this could be a data set of malware that you're poisoning, and so then suddenly these attacks could have a lot more serious consequences. So the fact that you can launch these attacks was recognized for a while. There's been a lot of work sort of showing that these attacks work extremely well and that very often the amount of data you need to control to get such an attack to succeed is very, very small. Sometimes even if you can, if as an attacker you can, maybe control less than a thousandth or a 10th of thousandth of all the data that's used for training these models, you could probably pull off an attack like this. So this part was kind of known. This kind of got us to wonder then it’s like, why don't we seem to see attacks like this in the wild? Why is not like every model that is being deployed
D 15:55
If it's practical, you would assume that it would be more exploited. Right. It would be taken
Florian 16:02
Right, well, that's where what we call practical would come in and that sort of, yeah, what we realized is that all these, whenever people were talking about poisoning attacks, they kind of just assumed the adversary has control over a very small fraction of the data. But then if you look at the model that's deployed like, say when OpenAI released ChatGPT, we know this model was trained on a whole bunch of data collected from the internet that sort of comes from a whole bunch of different websites that are all controlled by different people. Because of the size of this data set, the people who designed or trained the machine learning model couldn't just go and sort of make sure that all this data was kind of good and safe from attack.
D 16:55
That would have been a nightmare.
Florian 16:58
Right. And so it seems like there would have been potential there for someone to just go and tamper with all with this data, except that to kind of do this, you would have to have been sort of at least as visionary as OpenAI itself in sort of thinking that they were actually going to go and train a chatbot on this data. And sort of you would have had, maybe a year ago, you should have had the idea of, oh, I'm going to go and tamper with a whole bunch of text on the web and then hope that this is the text that they were going to sort of scrape to train their chatbot, and then hope that your attack succeeds. And so there's sort of a lot of uncertainty in this process where you don't really know ahead of time which data these models are actually going to be trained on. You don't necessarily know what kind of models are going to be trained. And so this kind of makes launching such an attack relatively difficult, or in other terms, impractical.
D 17:55
Because wouldn't you have to go way upstream to the data set and the locus of the data sets that you're going to train? As an attacker wouldn't you have to say, hey, we're going to plant this gap in the training data set way upstream? You'd have to know what that model is going to train on from a publicly available data set, wouldn't you?
Florian 18:18
Yeah, exactly. That's the point. You would need to know this ahead of time. And it turns out that in some cases you do know this ahead of time, that collecting data, sort of scraping a whole bunch of data from the Internet and then curating it to a minimal extent, or finding ways to sort of label this data; it's still a relatively tedious process. And so this isn't sort of redone every time someone trains a machine learning model. So instead, there are some researchers who kind of specialize in creating data sets for machine learning. And what these people do is they will go through the sort of tedious process of scraping the web and finding texts and the images that are sort of suitable for machine learning, and then they collect all of this into one big data set that then hundreds or thousands of people after them will sort of use for machine learning.
D 19:25
Wouldn’t this be a bigger threat, though, with the rise of more open source assets, more open source models which are divulging how they were trained and talking about what types of data sets they're trained on? I would think that if you were somebody that wanted to swim upstream, contaminate that data source, especially for all of these large language models and others that are used in the ML supply chain now, isn't that - isn't the open source set of assets - I guess my question is, couldn't they become more vulnerable to type these types of attacks as more open source assets come into play?
Florian 20:05
I think to some extent, yes, that open sourcing these data sets and sort of showing exactly where the data comes from brings vulnerabilities that would have been maybe slightly harder to explore if the data set was just not released at all, which some companies do when they train machine learning models. I think this is sort of always a bit of a trade off that comes with open sourcing things. On the other hand, of course, if the data set is public, it makes it much easier for people to actually go and inspect this data and sort of figure out whether there are maybe things in it that shouldn't be in there, sort of finding better ways to filter this data and so on. And so I think this kind of endless debate of whether open sourcing things is more beneficial or not for security is also something that we're going to see in machine learning, for sure.
D 21:04
Yeah. The cost benefit analysis in terms of security, I guess begs the question, right. Pen testing, red teamers, appsec, infosec teams, ethical hackers; they don't seem, at least from my lens - the people that conduct these types of breaches and attacks to inform security postures - they don't seem to be doing this as much. Do you agree with that? And if so, why do you think that's the case?
Florian 21:36
Yeah, I don't know for sure, in that I know that some companies have internal sort of AI red teams but that kind of keep of course what they do extremely, I would say sort of confidential and so I don't know to what extent this is being done. I think it's also the expertise that's maybe needed on the machine learning side is relatively scarce and so just building up such an AI red team is probably not something that most companies can get away with, that they'd probably rather just use this extra knowledge or workforce just to build better machine learning models.
D 22:28
Are they going to have to think that in the future? Right, if everybody's like hey, we need an AI chatbot, just like ChatGPT or something, you know, your large enterprise.
Florian 22:38
Oh year, definitely. That's why at least I think some of the biggest companies in the space are the companies with some of the biggest AI labs like Google, DeepMind, Microsoft, OpenAI and so on. They have people who are red teaming AI. I guess they are still doing this fairly internally and so it's not always very clear from the exterior sort of what they might be finding. I think also what hasn't gotten a huge amount of attention in this field so far are just very traditional sort of OpSec vulnerabilities in these systems, which is actually what we looked at in this work on poisoning attacks right. Where essentially, in the end, what we found was kind of an issue there is that these training sets that people were collecting and kind of distributing just weren't properly integrity protected. So, just to give some context here, the way you would train a machine learning model today, if you wanted to train something like a chatbot or an image generation model that can sort of compete with the state of the art, you need so much data that comes from so many different places on the internet that the data sets that people collect for this, this isn't just something where you go and download one data set somewhere on a central server or like a torrent or something. Instead, the sort of researchers who create these data sets, they just give you an index that sort of says, here's a URL where you can find an image of a cat, here's a URL where you can find an image of a dog. And they repeat this like 2 billion times, and then you have this large index where you can sort of go and download all these things and sort of create your own local training set. The problem is that now that you've done this, I as an attacker, I can sort of go and download this entire list as well. And now I know exactly what data people are going to go and crawl every time they train a machine learning model. And because this data just comes from sort of everywhere on the web, a lot of this comes from just random domains that belong to whoever, we don't know. And what we essentially found, the sort of core vulnerability in these data sets that we found, is that a very large fraction of these URLs, over time, they expire. Like the person who owns this domain just decided not to pay for it anymore. And so in many of these, this is what we call web scale data sets, or data sets that are really sort of just scattered across the entire web, we find that there's maybe like something like 1% of all the data in these data sets are just up for grabs for anyone who's willing to go and pay for a whole bunch of domains. And this is a very sort of traditional OpSec vulnerability. You don't need to understand anything about machine learning to sort of see that the data set is
D 26:08
It seems like you’d want to enforce Zero Trust like policies, right, in terms of continuous verification of those assets and making sure that you've cycled through those operational frameworks. I guess because of that, though, what you're talking about are there certain use cases or specific models that are easier targets you think of or are more at risk because of the operational gaps and controls?
Florian 26:37
I think essentially all these models that are trained on, again, sort of web scale data are just inherently at risk to attacks like this. And so this in particular would mean any model that's trained on any sort of text model or image model or code generation model that's been released in the past, sort of one or two years has, just like the scale of the data that you need to train these models is so large that we just we do not know of any reasonable way of curating data sets to sort of make sure that the data hasn't somehow been tampered with. I think code models, like code generation models, are among the models that I think are most worrisome there from a security perspective because developers are going to start using these things because they're just amazing, they save you a lot of time. At least that's been my own experience and from what I've heard, also from many, many other developers, and also the way you train such a model is just by crawling every single repository you can find on GitHub or somewhere else on the web.
Florian 28:00
Yeah. And so anyone can just go and create a code repository today that just contains a whole bunch of insecure code and then hope that this gets included in the sort of next training phase of the model. And I think there's little reason to believe that this isn't happening, that these models aren't sort of being continuously retrained to make them better. And there it seems to me that this is sort of a question of time until someone actually pulls off an attack. Yeah, and that's kind of the scary thing about this type of poisoning attacks is that we can’t even be certain that these attacks haven't been pulled off yet. And that the whole point of these attacks is that the final model should still be behaving essentially the same as if it hadn't been poisoned. And so to the average user, you wouldn't notice that it's doing anything bad. And then maybe in some very specific targeted setting, like if you're currently developing the Linux kernel or something, then suddenly the model is going to start very subtly introducing bugs into your code or something. And this would be very hard to detect that an attack like this has happened.
D 29:25
So how would you guide ML practitioners, right, the ML engineers, data scientists, people who are building AI applications and building ML systems, how would you guide them to kind of safeguard against these types of attacks? Because a lot of what you're talking about are supply chain vulnerabilities, really, particularly in the data set in the model. If you're an ML practitioner and you're guiding someone today, what are some of the macro things that you would say, hey, here are some of the things you need to start doing because these attacks are going to happen whether you see them or not. So you kind of want to shrink the opportunity for those attacks to happen, right?
Florian 30:06
Yeah. And so I think one thing that as a community, generally, we have to start investing in and sort of find better methods for is just ways to better curate or sort of inspect training data so that if some of these attacks are happening, we can sort of hope to catch them ahead of the curve. And there, I think even just for any machine learning developer to just spend more time looking at the data and thinking about where is this data being sourced from and what kind of trust am I putting into different entities by using this type of data. These just seem like the kind of questions that people hadn't really been asking themselves and that have led to some of these vulnerabilities. This is kind of ahead of model training where you could hope to at least have some mechanisms in place to try and filter out bad data. And then I guess, once the model is deployed, you would also hope to have mechanisms in place to sort of audit how the model is being used and hopefully detect. Cases where the model is kind of misbehaving. I think this would mean in many cases, just trying to design applications in such a way that they're not solely dependent on outputs of a machine learning model, because that is just sort of begging for your system to fail at some point because these models are just not super reliable yet.
D 31:54
Yeah, your preprint also hinted at kind of two things I thought were interesting. One was kind of this notion of you have a laugh-like capability to kind of see these things and spot for it and alert for it. But the other thing that your preprint mentioned was there's a section in there called Existing Trust Assumptions. And that jumped out to me because it made me think, you're kind of talking about zero-trust architecture best practices in ML. And I'm curious if you think that most ML systems are building in those types of policies. Like, Google talked about this ten years ago, and large enterprises still don't seem to have zero-trust architectures nailed. Talk about the need for that different trust approach. You talked a little bit about it. How do we think about things like continuous verification and this backdrop of data poisoning attacks? Is there a good open source attack and defense tool for data poisoning that practitioners could build into their systems right now?
Florian 32:53
So that's kind of the unfortunate reality of where we are right now is that there is very little in terms of tooling available, because we don't even know how to build it. For many of these attacks we sort of - people come up with heuristics, sort of ad hoc defense mechanisms to kind of try to either protect models or to detect these types of attacks. And then a few months later, someone comes up with a slightly more involved attack that would actually defeat these types of defenses as well. So it's still really a setting where we don't fully know how to completely prevent these types of attacks. And so it's really a question of minimizing the attack surface as much as possible. And I think for now, unfortunately at least, it seems that in many situations, the best we can do is to at least be aware of where these trust assumptions go. And I think in many cases, they go far beyond the data. I mean, it's sort of on the one hand we're talking here about going and sabotaging data on various places in the internet, but in probably 95% of applications of machine learning today, somewhere along the line, someone will call pip install some package from some random developer that they don't necessarily trust. And this is sort of, it's been known for years that this enables sort of arbitrary code execution on your machine, but it's still something that we routinely do, or people routinely do every day. And I'm sure there are some companies that have internal kind of best practices, security best practices that will prevent you from doing something like this. But I think in many, many cases, they're still just - people put trust in resources that come from sometimes fairly random places online, sort of different code bases. Anyone can go and upload a model or a data set to Hugging Face and then sort of share it with others for download. And there's very little so far in terms of principled ways of establishing trust there, except kind of by reputation, right? It's like if OpenAI or Google or Facebook releases some new package or model, you kind of trust them because we already do trust them with many, many other things. But yeah, this is a very good question of how we might be able to go beyond this [...] very base level of reputation trust, and actually verify the type of resources we're using. The code we're running, the machine learning code, the machine learning models that we use and build on, the data that we use, that these are actually trustworthy. And many of these things are just open questions at the moment.
D 36:15
Got it. So one of the things when you talked about trust and when you surfaced in your paper, you surfaced the sources that you used and you notified the maintainers, I'm really curious, how did they respond? I would assume in a lot of these cases they're not used to the type of security vulnerability disclosures and responsible disclosures that come their way. How did they respond to you?
Florian 36:42
Yeah, so this was also, you’re right, that in many cases - so the people we reached out to were essentially the researchers that had collected the data sets originally that were kind of maintaining this long list of sort of web assets. Go find a security disclosure email somewhere. Of course, most of these groups don't even have a dedicated web page, so it's only just kind of word of mouth. You sort of reach out to some person by email or maybe you know them personally and just talk to them and tell them, oh, we found this vulnerability. And luckily, in this case, there is a way of fixing this issue, which then all the maintainers that we reached out to did, which is just that when you release this list of web resources, you would just add an integrity check. Like you would hash whatever data content you expect to find at a certain URL, like an image or a piece of text. You would hash this data and then just include this in your data set index so that when six months later someone else goes and downloads this content, they can sort of verify that they're seeing sort of the same thing that the data set creators saw. So of course, this doesn't solve - all this really does is sort of make sure that you're seeing the same thing today as what the data set creators saw when they created the data set. But this doesn't guarantee that that data at that point was sort of reasonable. And that's kind of a much harder problem that we don't really know how to tackle.
D 38:40
Yeah, the ecosystem generally doesn't seem to know how to tackle that, from my view, right? Which is that like one of the fixes that you talk about in the paper, for example, data checksums and hashes, which you just mentioned, some of the data providers would say, well, this potentially downgrades the utility of the data set. Right? When do you think this balance, particularly for data providers, for all these models, open source or otherwise, when do you think this balance between accuracy concerns and security concerns tilts over more towards security?
Florian 39:19
I think this is going to be once we start seeing attacks, and I think for sort of real practical attacks, and I think for now we're not there yet, or at least there hasn't been sort of something big like this that has happened. And so my guess is pretty much anyone training these machine learning models is just going to go for maximum accuracy by just collecting as much data as possible, which is in our case, it's just going to mean ignoring checksums because that makes sure you collect more data than if you do. And then I think at some point we're going to have the equivalent of Microsoft's Tay chatbot from like five, six years ago, for something more modern where it's kind of like the reputational hit they took with this model, I think was big enough that they decided to shut the entire thing down.
D 40:23
Like you're seeing it with Bing and the rollout of ChatGPT Bing engine. So you're saying similar types of behaviors on the reputational risks. That makes sense.
Florian 40:37
Yeah. And I think for now, for many of these models that were released in the past year, be it language models for, like, chat assistants or image generation models that many, many people are playing around with; many of these models are just [...] you're only interacting with them locally in a kind of isolated environment. Right. If I talk to ChatGPT, this doesn't sort of influence any other system in the world. And so I can get ChatGPT to say bad things to me, but in some sense, who cares? I'm just attacking myself. With Bing, I think we're seeing a bit of the first wave of these large machine learning models are really being integrated into larger distributed systems and then things can all of a sudden start going haywire in much, much worse ways. So there was a very recent work by some researchers in Germany that was very cool and fun, where they showed that you could set up a web page. That just contains some invisible text that Bing is going to read and that essentially tricks Bing into asking the user for their personal information and then sort of exfiltrating this information to a URL that's controlled by the attacker. And this was all sort of a bit proof of concept type of thing, but these are the attacks that I think are going to start happening, especially once we start putting these type of models in any kind of web environment.
D 42:27
It’s basically a social engineering attack, right? It's like a newfound way of social engineering.
Florian 42:33
Right, yeah. And then, you're sort of putting these models again into areas where users then are sort of putting trust into the model right, to do things on their behalf. So with Bing, it's sort of just searching the web on your behalf and already there we're seeing weird things that could happen, but there's already many startups sort of proposing demos of using these type of assistants for helping you manage your agenda, book restaurants for you, book flights for you, these kind of things. And then all of a sudden you now have this block, this machine learning model that we know can quite easily be forced to sort of behave in very, very weird and arbitrary ways that [...] interacting with your sensitive information, like your private data, maybe, hopefully never your credit card number. And then, on the other hand, is interacting with the web, which is not a safe place. I think we're going to start seeing the equivalent of SQL injection attacks or sort of cross site scripting attacks on these chat assistants that operate in your browser very quickly, like this example I gave of this team of researchers that attacked Bing is kind of one first example of this. And these, I think, are going to be fairly easy targets, where usually there's also going to be incentive for attackers to do this because there might be money on the line, depending on what kind of capabilities these models have. My prediction is that this is going to happen and this is going to temper a little bit the expectations of some of these applications and
D 44:36
Let's get a chat bought up. Yeah. It may cause people to think twice.
Florian 44:40
Yeah it will force people to think about this and find ways to address these problems, which, for now, I think we only have very partial solutions.
D 44:51
Well, this is fascinating.
Charlie 44:54
Very. In thinking about that, Florian, and as folks try to build strategy, moving forward around defending against data poisoning attacks, somebody who is reading your preprint and starting to think about some of these things, what are a couple of the key takeaways there that you would want them to walk away with after reading your paper?
Florian 45:15
I think the main takeaway, yes, to actually ask this question of, like, who am I implicitly trusting with the data I'm using for my machine learning model? And sort of who could have access to this data and manipulate it? And usually the answer to this question is going to be scary. And so at that point, you have to sort of ask yourself, yeah, what kind of steps can I take to protect myself as well as possible? And at least one that we suggest for these web scale data sets is to just at the very least, verify that whatever you're downloading hasn't been modified since the time that it was first collected. And this is unfortunately going to mean that you're going to download way less data than if you didn't care about security. So coming up with sort of less intrusive ways of protecting yourself is something we have to figure out. But I think, yeah, if you want to be extra cautious, this would be the first step to go or to even yourself kind of more proactively filter through these data sets and figure out, are there certain domains in there that you are willing to trust to be serving good data and others that you might not? And once you do this, sort of how much data are you still going to be left with? That's kind of the question.
D 46:54
Thanks a lot, Florian. We really appreciate the time, and I'm sure our audience appreciates this all things data poisoning practical approach. Really fascinating discussion. Thank you for your time and for those tuning in. Thanks for listening, and we'll talk to you again soon. Be well, everyone.
Florian 47:14
Yeah, thanks again.
Closing 47:16
Thanks for listening to the MLSecOps podcast brought to you by Protect AI. Be sure to subscribe to get the latest episodes and visit MLSecOps.com to join the conversation, ask questions, or suggest future topics. We're excited to bring you more in depth MLSecOps discussions. Until next time, thanks for joining.
Protect AI’s ML Security-Focused Open Source Tools
LLM Guard - The Security Toolkit for LLM Interactions
Huntr - The World's First AI/Machine Learning Bug Bounty Platform
Thanks for listening! Find more episodes and transcripts at https://mlsecops.com/podcast.