<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=4373740&amp;fmt=gif">
MLSecOps-favicon PAI-favicon-120423 icon3

Practical Foundations for Securing AI

 

 

Audio-only version also available on Apple Podcasts, Google Podcasts, Spotify, iHeart Podcasts, and many more.

Episode Summary:

In this episode of the MLSecOps Podcast, we delve into the critical world of security for AI and machine learning with our guest Ron F. Del Rosario, Chief Security Architect and AI/ML Security Lead at SAP ISBN. The discussion highlights the contextual knowledge gap between ML practitioners and cybersecurity professionals, emphasizing the importance of cross-collaboration and foundational security practices. We explore the contrasts of security for AI to that for traditional software, along with the risk profiles of first-party vs. third-party ML models. Ron sheds light on the significance of understanding your AI system's provenance, having necessary controls, and audit trails for robust security. He also discusses the "Secure AI/ML Development Framework" initiative that he launched internally within his organization, featuring a lean security checklist to streamline processes. We hope you enjoy this thoughtful conversation!

Transcription:

[Intro] 00:00

Daryan Dehghanpisheh 00:07

Welcome to the MLSecOps Podcast, and for everybody joining it is a pleasure to welcome Ron Del Rosario. He is the Chief Security Architect and AI/ML Security Lead at SAP ISBN, which is a business unit of SAP. Ron, welcome to the show. Thanks for joining.

Ron F. Del Rosario 00:28

Thank you. Thank you, D, appreciate you inviting me to this forward thinking podcast. Appreciate it.

Daryan Dehghanpisheh 00:33

Yeah. Well, we're excited to have you. Hey, for our listeners and people who are reading the show notes, you know, what is SAP ISBN; what is that particular business focused on?

Ron F. Del Rosario 00:44

Sure. So ISBN stands for Intelligence Spend and Business Networks. It's the - famous products under the ISBN portfolio includes SAP Ariba and SAP Concur. So, SAP Ariba provides enterprise procurement supply chain and spend management solutions. And Concur offers enterprise expense, invoice, and travel solutions to large organizations around the world. So these are all software as a service.

Daryan Dehghanpisheh 01:14

Just some small multi-billion dollar lines of business that, you know, are used in a prevalent amount of Fortune 1000 enterprises and beyond. Yeah, you know, those little things, <laugh>, that's really great. That's cool.

Ron F. Del Rosario 01:29

Yes. These are SAP systems, correct.

Daryan Dehghanpisheh 01:30

SAP systems. Awesome. So, you know, I came across your LinkedIn profile - that's actually how we met - and the first thing that stood out to me was the Michelangelo picture. So if anybody's there, they should go click and become friends with Ron right now, if for no other reason than this Michelangelo pictures at the top. I'm curious, man, why that is an image. There's a lot of people that have a lot of things. Yes. What's the story behind that image?

Ron F. Del Rosario 01:56

So this is, this is an, this is an awesome question. It's the first time it was asked to me. So I'm, how should I say this? I think the best way to describe it is I'm a big fan of inventing stuff, right. Through writing, expressing your thoughts in writing, and even, even in, in my own, in my own art when I draw some stuff, right. So, I'm a big, I'm a big fan of Da Vinci of course, and the way he,

Daryan Dehghanpisheh 02:29

Oh, it's Da Vinci, not Michelangelo. This is my bad. 

Ron F. Del Rosario 02:32

Yes, this is -

Daryan Dehghanpisheh 02:33

<Laugh> Da Vinci.

Ron F. Del Rosario 02:34

All the art that you see there are his, well-known drawings back in the past, the Vitruvian Man, the, it looks like a helicopter, right. And the bridge [design] and all this stuff. Yeah. So he, back in the days, of course, there's no internet, there's no computer yet. This, this all exists in his head, right? He, he's super creative, and at the same time, he knows how to execute, right? And the way he executes is he starts thinking about stuff and then drawing them in details, and then he executes on stuff. That's always been my mantra. Like when I got started in information security, right, then I think about problems, and I tend not to solve problems on how they were classically solved in a way; I try to think of some creative ways on how to solve a specific problem. So it helps the company, it helps the enterprise, right?

Daryan Dehghanpisheh 03:32

Yeah. So, with Da Vinci as your muse, I'm curious, how have you been thinking about this transition into the AI space and in particular, security of AI? You know, if you had to both draw it out, write about it, talk about it, and then execute, you know, how do you, how do you think about this domain?

Ron F. Del Rosario 03:54

It's, I think for us, security practitioners, so to speak - information security, computer security, software security - it's an opportunity. It's a giant opportunity for us to be creative on how we attack things, right? Specifically in the AI/ML space, right? I think it's safe to say that you won't be able to find a security expert that's been doing 10 years plus in security in AI/ML, right?

Daryan Dehghanpisheh 04:26

Yeah. I would take those odds in Vegas. I think you're right.

Ron F. Del Rosario 04:29

I would say, I think…I'm a big believer of like 10,000 hours of deliberate practice. You know, for you to be considered an expert in your field, you need to put in like 10,000 hours of deliberate practice. Now, put that into perspective where product security folks and our development teams are developing AI/ML use cases, and we need to review this AI/ML system from a security perspective. And guess what? We don't have, we don't have that data scientist and machine learning background, right? And it's, it's up to us to be creative on how we attack this problem. And it's, and every day there's a new use case for developers to start adopting AI/ML in our existing, in the products that we offer, right? So, so this is, this is my, this is my take on how you - it's a perfect opportunity for security folks to be creative, leveraging our security mindset, at the same time learning to adopt on how we can learn AI/ML fast and learn how to secure it at a foundational level.

Daryan Dehghanpisheh 05:36

So, on that, on that basis, right, that transition from just traditional software security, architecture, security practitioners, red team, blue team type of mentalities, you know, as they move into AI, what's the biggest thing that is the same? And then I want you to tell me what your single biggest view is on what's different.

Ron F. Del Rosario 06:02

Okay. So it's, I think foundationally there's a lot of commonalities. Because with traditional information security, let's say we're protecting data, we're protecting assets, right? Data as an example, right? Mm-Hmm. <affirmative>. And with, with, with the advent of AI/ML development, what's it's, again, we're, we need to focus on data, the dataset, right? When I say dataset and like ML systems, traditionally, they need tons of data as part of their training process, right? 

So part of the learning journey for a security leader to learn how to secure AI/ML system is focusing on key aspects like data sets, right? Our developers, they will need access to massive data sets. It can be internally sourced or externally sourced, and we need to think about how we protect this, this massive dataset, so it doesn't influence how the machine learning model performs in production, right? So those are some of the things that I tend to focus on as a, on a foundational level.

Daryan Dehghanpisheh 07:12

So when you talk about securing the data set, there's a lot of ways to think about that, right? There's everything from maintaining the version histories, maintaining absolutely lineage around that. There's things such as, you know, making sure that the data set itself has been scrubbed to then, and there's access controls. There's all these, like, traditional things. 

I guess one of the more interesting things in AI is that, is that the AI models generally do memorize some of their training data sets, right? And if you're trying to secure the model or secure that model from giving up a lot of that data, are you guys focusing on that as well in terms of how, how that is, is being <inaudible>

Ron F. Del Rosario 08:00

So, I think what's happening in the enterprise nowadays is when development teams, they have an existing use case for let's say integrating an ML model in an existing product, or leveraging large language models to be specific, it's usually down, you can classify into two buckets. 

One is it's an internally developed machine learning model, meaning your development team is responsible from zero to pushing it into production. 

And the second most common use case, which is prevalent right now is like, we're just gonna, we’re just going to consume a foundational model offered by a cloud service provider, right? So, like, ChatGPT.

Daryan Dehghanpisheh 08:48

Point to GPT or Azure GPT, or AWS Bedrock, or Gemini, whatever it is.

Ron F. Del Rosario 08:55

Exactly, so two different - same use case for leveraging ML model in their product, but two different ways to consume machine learning models. And they have like a separate risk profile, in my opinion, right?

The first, the first scenario, it will take more time from you as a product security, as an AI/ML security lead to guide your development teams in the entire life cycle of creating the machine learning model from scratch, right? Because you tend to, you tend to focus: okay, you need to try to understand a problem and how they plan to solve it with machine learning, and you try to understand how they gather their data sets for training, right? Who has access to the training dataset? Where is this training data set coming from?

Daryan Dehghanpisheh 09:44

Open sourced <Laugh>? Did I just pull it down from the open source world? Did I just bring it in?

Ron F. Del Rosario 09:49

Exactly. Internal trusted sources - meaning from our systems, our infrastructure - is it, are we, do we have the correct proper permissions to use this if it's coming from customer data? 

Daryan Dehghanpisheh 10:04

Right. Controls. 

Ron F. Del Rosario 10:06

Versus training data that's being pulled externally. This is high risk. I've seen some scenarios in the past with other companies where developers started using publicly available information, either from GitHub or any well-known repo out there, and use that to train their ML models, right? With no, with no vetting of some sort, right? 

So, what happened is any user with malicious intent can modify that third, that training data originating from a third party source. It's coming into your system, and your developers just rely on the raw output of that of that processing, right? Without vetting. So that's high risk. 

So it's, and the second scenario, which is developers are simply using foundational models from cloud service providers, and it has a lower risk, in my opinion, from a security perspective, because we have the benefit of a cloud service provider providing the initial layers of security, right?

<Inaudible> to access that large language model, the training that they had to put in to make sure security and privacy considerations are considered, right? Think about Gemini, the new model from Google, right? So you, as a developer, you have access to these built-in security characteristics of this LLM model, right? 

However, so what do I do - if you're if you're the lead AI/ML Security - you still need to understand how developers are maintaining their access and protecting their models in use through the cloud service provider we talk about.

Daryan Dehghanpisheh 11:47

Yeah, hey, I really wanna dig in on this because I think you're making a really excellent point. We keep talking about the model. And in our worldview at Protect AI, the sponsors of the MLSecOps Podcast, our view is pretty simplistic is, which is the only real difference between an AI application or a class of software called AI and anything else, is that it relies on or uses or touches or engages an ML model - a machine learning model - at some point. How do you guys think about the security of that asset at SAP ISBN, which is, that asset being the machine learning model?

Ron F. Del Rosario 12:29

Absolutely. So internally developed machine learning models, my opinion - by the way, some good inputs about that - I also like your take about, we need to, we need to take into consideration provenance, right? We need to like, like an audit trail, right? End-to-end audit trail, how this ML model ended up in this, in this state, like all the sources of data sets. But anyways, one of the approach that we're looking into is, one of the approach that we're looking into is having an internal system that, that acts as a broker, manages the, it acts as a broker before a development team can access external large language models. So we have an internal system. We have an internal system that developers can make a request to, instead of making a direct request through the internet, through Hugging Face or any other, well-known model repositories.

Daryan Dehghanpisheh 13:27

So basically you are proxying that model as a gateway?

Ron F. Del Rosario 13:32

Exactly. So we have this secure proxy approach to make, to ensure that all machine learning models are vetted from a security and privacy perspective before we - before developers can use it.

Daryan Dehghanpisheh 13:49

Yeah. And we have a product at Protect AI, which is exactly for that. Protect AI - the Guardian model proxy and model scanning tool, which can be used for internal repository management as well as external proxy. So it's, it's cool to see you guys implementing that. 

So, that gives you kind of the provenance and the lineage of that, which makes me wonder kind of how you think about you know, any other asset in your, in your ecosystem. Ron or Ron's team, or anybody with the proper, you know, authorities, rights, and permissions can scan an environment and say, oh, in SAP or SAP ISBN's ecosystem here is Ron's PC. It contains this operating system. It contains these files, these applications, has these permissions. And if there's ever a mismatch between your stated policies or your stated postures, right? And what's happening on that device, in this case, an endpoint, you have the ability to kind of proactively address it.

You can take it off the network, you can push update, you can force update, you can shut it down. You can even send a note to Ron's boss saying, he didn't upgrade, he didn't do this. Right? Like all the digital nanny tools that we have in the enterprise. 

A lot of that comes from that lineage and that provenance of a device or an endpoint or a piece of code, any other asset. But that doesn't exist for ML models. I mean, our company provides that, but do you guys have tools that are creating those types of things that allow you to create security postures on top of those assets, the models themselves or the data sets? How do you guys go about that?

Ron F. Del Rosario 15:31

Automated and centrally managed solution, we don't have that capability yet, but that's something that we're actively addressing, and at the same time this is the reason why last year I started an internal initiative called “Secure AI/ML Development Framework” within ISBN. 

Daryan Dehghanpisheh 15:53

Oh, that's awesome. 

Ron F. Del Rosario 15:55

Which is something that, which is an initiative that I'm leading inside SAP ISBN, and we're focusing on these aspects, although we're trying to attack it manually. And this is where my lean security checklist came into play. 

Daryan Dehghanpisheh 16:08

Yeah, we're gonna get there. We're gonna get there because I think that's fascinating for, but I actually want, before we get there, I'd love to hear about the framework and educate the listeners and the readers and the viewers on what your framework consists of and how you think of that.

Ron F. Del Rosario 16:22

Right. So the framework is basically providing education and awareness across all product security teams within ISBN. It's - what we're trying to do is surface the risk with, of surface the known risk in AI/ML development, right. From a security and privacy perspective. 

I think you and I know this, D, the biggest problem right now is we have a lot of experienced cybersecurity professionals, but we don't have a lot of experienced cybersecurity professionals with foundational knowledge in data science and AI/ML techniques, right? So that's the number one problem. So the initiative - 

Daryan Dehghanpisheh 17:14

So it's to basically educate them on, say how the software development lifecycle of machine learning or AI application building is different than traditional web API development. Okay.

Ron F. Del Rosario 17:24

Absolutely. It's, so part of the initiative was for me to conduct like an organizational wide virtual training across all AppSec teams within ISBN. I give them an overview of how AI/ML development - the AI/ML development lifecycle - and how ML is different from traditional software engineering. And then I gave them an overview of current industry security best practices, which includes OWASP [ML Security] Top 10, Atlas - MITRE Atlas attack framework, and even new documents from NIST about AI/ML taxonomy and known risk, right? So it's educational awareness, that's number one. 

Number two is the creation of a checklist, right? So to guide product security teams how to conduct efficient and effective security reviews of upcoming AI/ML features or products within ISBN. 

And the third one is a little bit of metrics, right? Benchmarking <inaudible> right? So that's basically the three pillars of the secure AI/ML initiative within SAP ISBN.

Daryan Dehghanpisheh 18:40

Awesome. And so, against that backdrop, now that our listeners, readers, and viewers have that as context, let's talk about that lean security checklist approach against that framework. You know, walk us through kind of that checklist from your perspective, because it seems to me that once you focus on, hey, there's way more commonalities in the development pipeline than there are differences, but what you need to do is take, say, the same things you're doing for mitigating authentication bypasses or mitigating you know, mismatches on access controls, the same thing you're doing there, you need to be doing in the AI/ML space, but here's how it’s different.

Ron F. Del Rosario 19:23

Absolutely. Absolutely.

Daryan Dehghanpisheh 19:24

So how - walk, walk the audience through your lean security checklist and help us understand how it connects to those pillars of this secure AI framework.

Ron F. Del Rosario 19:32

So, absolutely. So, the inspiration came from high assurance and mission critical task and operations, right? So in the use, in using checklists as an example, before you can launch a rocket or space shuttle to space, right, he aviation industry, and even some doctors before they operate on a high, a high profile patient, they use a checklist of some sort, like operational readiness, right? 

So a checklist is a simple tool that can help us determine operational readiness. And in the context of AI/ML development, it simplifies the complexity behind it. So we can guide application security engineers or product security engineers to ask the right questions for a, during a secure review of an upcoming product or feature within their organization, right? So it's the - the cost to acquire cybersecurity, the cost to acquire talent with cybersecurity and AI/ML background or vice versa and the amount of time [...] to train an existing member of the team how to secure AI/ML software during design and development, it's almost impossible, right?

Daryan Dehghanpisheh 20:51

Because you're trying to force them to take that leap to say, hey, get familiar with what we mean by feature engineering. And they're like, what?

Ron F. Del Rosario 20:59

Absolutely. So, absolutely. So AppSec teams, they cannot be experts in AI/ML in a matter of months, right? Same with developers and engineers. Same with AI/ML developers and engineers, right? They cannot be expert in a matter of months in cybersecurity, right?

Daryan Dehghanpisheh 21:13

Well look at the industry, just look at the industry basics, right? How long have we had SCA tools and scanners and things, and they'd say, hey, you're leaking the API key in this JSON [JavaScript Object Notation] file, but a data scientist - most data scientists - would be like what's a JSON file? I like, tell me where it is in my Jupyter notebook, right? So context goes both ways, I would imagine.

Ron F. Del Rosario 21:34

Absolutely. So, this is where a lean security checklist helps, right? So a lean security checklist, it focuses on the foundational aspects of secure AI/ML development, right? Like what you mentioned, access control. Access control is super applicable to a lean security checklist because you, as an AppSec engineer interviewing the developer: who has access to your data sets, where are they stored, and how are they protected? When I say - and this can also include artifacts used to develop the machine learning model - when I say artifacts, this can be any file that has to do with your -

Daryan Dehghanpisheh 22:14

It goes back to your whole concept of full provenance of an AI system. Like the artifacts are different in that environment.

Ron F. Del Rosario 22:17

Exactly. It can be a text file, it can be a code snippet, it can be a configuration file. When a threat actor gains unauthorized access to these artifacts…and threat actors are smart, right? They can easily, oh, you guys have a PyTorch. This is a, this is a configuration for PyTorch or for TensorFlow. Oh, you guys are deploying an AWS Sagemaker as an example, right? So those are the things that we need to protect. 

So, ideally, the security leader or the AI/ML security lead drafts this checklist by collaborating with data scientists and AI/ML developers, either inside or outside of the organization, right? So, hey, I have this checklist from a security perspective, I want your take on it as a data scientist or ML developer, right? Does my question make sense to you? Right? It has to apply to what you do on a daily basis. 

So, some guidance on how do you create the security checklist; you need to draft it like, the wording is expressed like how IETF [Internet Engineering Task Force] drafted Request for Comments, RFCs document, right? You need to use the words “shall” and “must,” and then you support it by guiding questions for product security teams so they can solicit the correct information from the development teams so you can understand the security posture of the AI/ML project, right.

Daryan Dehghanpisheh 23:43

Yeah. And I imagine that that, this goes back to kind of the top right, which was, hey, if you have full provenance, if you have these checklists, if you have these Bills of Materials, if you have those things, you can kind of meet that “must” and “shall” type of language or type of framing that one would have, because at least you're starting with a baseline understanding that allows the two different practitioners from two different perspectives to kind of meet in the middle and, and, and standardize against those security postures that you're looking to implement, right?

Ron F. Del Rosario 24:19

Absolutely. And then, so, go ahead.

Daryan Dehghanpisheh 24:20

I was just gonna say, along those lines though, it's really fascinating because what you're talking about is really that ML system, that model system and the pipeline and all the components around that, that then feed into the entity of the AI application. And OWASP has separated their Top 10 security components between AI applications, which is heavily Generative AI focused - which I think is a mistake…my worldview, OWASP - and ML systems, which I actually think is very intuitive. It's very smart to say, hey, the machine learning model, no matter how it's given to you or how it's constructed, this is the security realm around that. And they, and they delineate these two things.

OWASP, their Top 10 - and you're a contributor to that - you know, that must have exposed you to some real eye-opening insights into these AI/ML security risks. What were some of the threats or vulnerabilities that really kept you up as you were thinking through this on this Top 10, like, what was the thing that made you go, whoa, I need to, I need to be worried about that?

Ron F. Del Rosario 25:29

So I think I kind of alluded into this earlier already. So it's, so the common, I think one of the one of the common mistakes or problems that we run into since we're in the early stages of machine learning security - AI/ML development security - is, so let's say I'm a developer. I'm using my, I'm working from home. I have my laptop with me, and I use it to develop and write pieces of code to support machine learning model development, ML model development for my company. And developers are not really super familiar with the fact that if you're saving your files, any kind of file or what we call artifacts for ML model development, and you're saving it with no access control, no protection or any, any kind of encryption mechanisms, and you're storing this, storing them as plain text on your developer laptop, it's -

Daryan Dehghanpisheh 26:34

It's, which happens all the time, <laugh>

Ron F. Del Rosario 26:36

It's easy. It's easy to target. If I’m a, let's say I'm a beginner developer, I'm not, I'm not properly introduced to the, some known risk in AI/ML development security, of course I'm gonna just gonna keep it, I'm just gonna save it in an unencrypted folder in my laptop and then, and not gonna do anything to protect it, right? Because I lack the foundational education and that I should be protecting these artifacts, right? 

So, what's gonna happen? A developer falls prey to a phishing attack, the developer laptop gets compromised by a threat actor, and guess what the threat actor stumbles into? Look, it's a folder of unencrypted files, and it has all the artifacts specific to an ML model development. 

So I'm, so what I'm trying to say is, as a security leader for your organization, don't get overwhelmed by the new data science and AI/ML jargons, you know? Focus on the fundamentals. As a matter of fact, one of my contributions to the OWASP Top 10 for LLM, my focus was on model registries, right?

Daryan Dehghanpisheh 27:44

Oh, hey man, do you know how many companies don't even have model registries? They just have models stored in S3 buckets or Azure blob storage?

Ron F. Del Rosario 27:51

Exactly. What are you doing? What are you doing? So don't get overwhelmed. Like, don't say like, oh, I don't have any ML or data science background in my team. Well, you don't have to, right? Focus on the fundamentals. It's very important, right? 

So model registries, so my number, my recommendation to OWASP Top 10 for LLM is development teams should use access controlled model registries to keep track of their ML models in production, right?

Daryan Dehghanpisheh 28:18

That seems like a typical stumbling block in a lot of cases. Fair? Unfair? What’s your take?

Ron F. Del Rosario 28:25

It's fair, it's a fair assessment, in my opinion. So one of our responsibilities as security leaders, like what I alluded to earlier, we're here to surface the risk, right? So we're here to educate the teams that there are known risks in AI/ML development lifecycle and you don't have to, we don't need to purchase a commercial tool at the early stages. We need to focus on education and awareness, and then making sure we operationalize all the stuff that we learn, right?

Daryan Dehghanpisheh 29:00

And I think that that's what's so cool about kind of how you guys are taking the lean security checklist approach for AI. It very much compliments what we believe in from an MLSecOps perspective. MLSecOps is a set of philosophies and principles, and you kind of just go through that prefried checklist. It's so refreshing to see that, you know, a company like SAP say, we gotta go do that.

Ron F. Del Rosario 29:26

Absolutely, yep. And we, you know this as well, D, we start manual. We write things manually, we write reports manually, but we can easily semi-automate and automate this in the future once we start expressing our artifacts as JSON or metadata. Now we're -

Daryan Dehghanpisheh 29:45

Love it. Music to our ears. And for those who don't know, that's the Protect AI platform, so we agree.

Ron F. Del Rosario 29:54

Yeah. Absolutely. And any type of system now can easily parse and access your, your security checklist, questionnaire data, even your ML model cards data. If you express them as JSON, now you have, they can be easily parsed and you can easily integrate it in any of your IT security or application security risk management solutions, right? That's the vision behind all this. But of course, before we can run, we need to crawl and walk first, right? Manual.

Daryan Dehghanpisheh 31:55

Man, there's a lot of people that I just wish, if we're in our infancy of AI/ML development, like we're doing tummy time. Yes. We're trying to like, get over and strengthen our head, do some tummy time before we even think about crawling. That's great.

Ron F. Del Rosario 30:36

Absolutely. So that was the vision behind it. And, and we're, we're in the execution state on how we plan to semi-automate all our secure AI/ML development initiative.

Daryan Dehghanpisheh 30:48

Well, that's really cool, because I think the audience can take away, you know, from a practicality perspective, start with the basics. Get an MLSecOps framework or your own lean security checklist approach. And, if possible, we'll link to the show notes elsewhere where this is at. Stay on top, don't fall prey to kind of the charlatanism of the industry. Focus on kind of what matters most, which is what you've been doing before; it's just applied into a new domain.

Ron F. Del Rosario 31:20

Absolutely. 

Daryan Dehghanpisheh 31:22

So along those lines, as we begin to kind of exit the show here, you know, you and I were talking prior to recording about Maslow's corporate hierarchy of needs: fear, greed and regulation. And I'm curious, you know, let's start with the fear one, as you think about security of AI, what is one of maybe the biggest fears you have that could be near term?

Ron F. Del Rosario 31:45

Like AI as a system holistically?

Daryan Dehghanpisheh 31:48

AI is a system that you need to guard against, right? So a big, the biggest fear of an AI breach? For you at SAP or something, at SAP ISBN in particular, what would you know, what would be different about an AI security breach versus a typical incident?

Ron F. Del Rosario 32:06

Can you give me an example?

Daryan Dehghanpisheh 32:08

That's what I'm trying to ask you for, because we have lots, right? We can think about when you deploy an AI in a system, whether it's generative or non, we have a lot of thoughts around what we think those differences are of an AI-oriented breach and how versus traditional software. And I'll give you an, I'll give you a simple example. 

Today, if there's a data exfiltration on a non-AI system, we have a lot of good forensic tools to kind of trace that back and figure out where, or how that happened, right? But in an AI development environment, in an AI system, and how the fragmentation of those data models are dispersed all over the place, and those data assets, and you don't know who touched it, you don't know what happened, you don't know what was brought in, you don't know like what data was guaranteed to be used in the train versus tune, you know, splits, if you will. There's not, there's not all that automation and control. So if there's a data leakage due to the AI environment, your incident response on that is grossly impacted. Like, it's, it's a much bigger blast radius in our worldview. Does that make sense?

Ron F. Del Rosario 33:20

I see. Yeah. So I think with your example, my biggest threat is trying to map out the attack surface. That's the biggest threat for me because if, like, I need to have visibility to the entire lifecycle of how this machine learning (AI) system was either developed or consumed from a third party and how it was integrated to our existing system. 

Going back to your example, I don't know where to start. The attack surface is too big. If it's, especially if it's originating from a third party source, the AI system where I need to interrogate the, the provider of the the AI system if it's offered as a software as a service (SaaS), right?

Daryan Dehghanpisheh 34:02

You know, we wrote an article about this, we'll put it in the show notes, which is this notion of kind of thinking about a traditional, you know, Lockheed Martin-esque cyber kill chain, and how an AI/ML Bill of Materials and the automation control of that really kind of bridges the gaps, if you will, between being able to see and understand and map the attack surface of an AI ecosystem, of an AI environment, and have full provenance of that model, how it's different. How an AI kill chain is very different than a traditional kill chain. 

So if that, and that actually seems like it's the biggest, both the biggest concern, meaning I don't have visibility into that, and the biggest opportunity that when I'm given that I can then bring in my lean security checklist, apply it against that, and make sure at a minimum that I have the, the postures matched, I would assume, right?

Ron F. Del Rosario 34:57

Yep. A hundred percent. So that's something, that's something that we can easily consider and incorporate in our security checklist. When we're deploying third party AI systems we need to understand the following, right? So how it integrates to the network and how it communicates inside and outside of the network and all those stuff. So we can identify like potential attack surfaces. 

And I'm a big proponent of the threat modeling, of course, any integration with a third party solution, regardless if it's an AI system or not, we do a full threat modeling approach, right? So we can, we can potentially identify potential threats and attack vectors on the system.

Daryan Dehghanpisheh 35:33

That's awesome. So, you know, let's leave our audience with one, with one future thing. What's a future prediction for you? I always ask people. Every time people talk about LLMs, I'm like, that's so 2023. I think 2024 and beyond is multimodal models, large multimodal models, as an example.

What, you know, what's the thing that you're focusing on most in the realm of AI and how it impacts security, particularly as you look out with your responsibilities, massive responsibilities, and over the next 12 to 18 months? How do you think about that, Ron? 

Ron F. Del Rosario 36:10

That's an excellent question. What I'm looking into is, I'm really super interested on how generative adversarial networks (GANs) will evolve and play out, right? I think next, I think maybe five to 10 years, it's gonna be an AI system protecting an infrastructure against another AI system, right? That's how, that's how generative adversarial network works. 

So I think to simplify the concept behind GANs, it revolves around a generator and a discriminator, right? The generator can be offensive, generates potential attacks against another system, and the discriminator tries to figure out, is this an attack, is this, or it's just a common traffic of some sort, right?

I'm super curious and there's tons of research already conducted on the potential of using generative adversarial networks for cybersecurity. I think it's, it's a worthy investment of time for us cybersecurity professionals to monitor the progress on advancement in GANs.

Daryan Dehghanpisheh 37:17

So speaking of worthy investment of time, this was a wonderfully worthy investment of time. Thank you so much, Ron. I appreciate that for coming on the podcast, educating our market and team. And for anybody who wants to see some really cool Da Vinci material, go check out Ron's LinkedIn profile. And thanks for tuning in. Ron, thanks for coming on. We really appreciate having you on.

Ron F. Del Rosario 37:40

Thank you. This was awesome, as expected. I love talking about what can happen in the future based on existing trends, and AI/ML is here to stay and we need to keep up to speed as cybersecurity professionals.

Daryan Dehghanpisheh 40:00

Thanks so much, Ron. And thank you to the audience as well. We'll put all the links in the show notes. See you next time.

[Closing] 


Additional tools and resources to check out:

Protect AI Radar: End-to-End AI Risk Management

Protect AI’s ML Security-Focused Open Source Tools

LLM Guard - The Security Toolkit for LLM Interactions

Huntr - The World's First AI/Machine Learning Bug Bounty Platform

Thanks for listening! Find more episodes and transcripts at https://mlsecops.com/podcast.

SUBSCRIBE TO THE MLSECOPS PODCAST