<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=4373740&amp;fmt=gif">
MLSecOps-favicon PAI-favicon-120423 icon3

MLSecOps Culture: Considerations for AI Development and Security Teams



Audio-only version also available on Apple Podcasts, Google Podcasts, Spotify, iHeart Podcasts, and many more.

Episode Summary:

In this episode, we had the pleasure of welcoming Co-Founder and CISO of Weights & Biases, Chris Van Pelt, to the MLSecOps Podcast. Chris discusses a range of topics with hosts Badar Ahmed and Diana Kelley, including the history of how W&B was formed, building a culture of security & knowledge sharing across teams in an organization, real-world ML and GenAI security concerns, data lineage and tracking, and upcoming features in the Weights & Biases platform for enhancing security.

More about our guest speaker: 

Chris Van Pelt is a co-founder of Weights & Biases, a developer MLOps platform. In 2009, Chris founded Figure Eight/CrowdFlower. Over the past 12 years, Chris has dedicated his career optimizing ML workflows and teaching ML practitioners, making machine learning more accessible to all. Chris has worked as a studio artist, computer scientist, and web engineer. He studied both art and computer science at Hope College.


[Intro] 00:00

Badar Ahmed 00:08

All right. I am Badar Ahmed, CTO, and Co-Founder Protect AI. And with me today, I have Diana Kelly, CISO Protect AI. Chris, thanks so much for joining us on the show. We're delighted to have you. So I'm gonna kick off with basically some storytelling here. I gotta say, love the Weights & Biases product, um, so congratulations on building an awesome product and a company. I know it's a product loved by thousands of data scientists and ML engineers. 

As a founder myself, I love to hear stories of how companies’ adventures get started. So I'd love to kind of kick that off and hear from you, how did, uh, you get started in tech? Maybe we can, you know, bring it back a little bit and then, for Weights & Biases, we'd love to hear the, the story.

Chris Van Pelt 01:10

Sure. Uh, so in college I studied art and computer science, which is kind of an interesting combination, but I was enjoying more and more actually spending time, with a computer, you know, built my old, my own Linux box. And in college, and I remember I set up like Gentoo, which was a real pain because you had to compile everything and things would like air out, you know, 24 hours into the build process and you'd be scratching your head.

Badar Ahmed 01:45

I remember that.

Chris Van Pelt 01:46

Yeah, good times, <laugh>. And, after, after college, I thought, well you know, I can get a job in the computer industry somewhere, but was having a difficult time actually finding work, so I would do like contracting work. I moved to Southern California after college and got a few odd jobs, some that, you know, ended up not working out and was almost gonna move back to the Midwest where I, where I grew up. And then got a job at a small company in Carlsbad, California, and this was right around the time that Ruby on Rails was becoming a thing. And I was a big fan, and I built some side projects with it and somehow persuaded my employer to let me rewrite their product in Ruby on Rails. And that's really what got me into, um, ultimately the Bay Area and the, and the startup scene. 

So fast forward a bit, you know, a few years after I graduated, say like 2007, moved up to San Francisco to work at a startup called Powerset that was doing natural language processing (NLP) to make a better search engine. And it's, it's fascinating now, like seeing what NLP is today. And the approaches used to have language understanding versus the approaches we were using back in, in 2007. So we had licensed Xerox PARC technology to do our natural language processing, and it was all a rules-based engine with, you know, a lot of like human, um, tuned features, and very different than the modeling we do today to create these systems that have language understanding. 

But, uh, got pulled into that company because of the Ruby on Rails connection, and it was there, I met Lucas Beal, who is my co-founder, our CEO at Weights & Biases. And at Powerset we were like 26 years old and thought, you know, we can, we could start our own thing. We know how to do this better than however this is being run. Uh, was a bit of the, the attitude and, you know, we learned quickly that it's actually like pretty difficult. But we left Powerset to start our first company CrowdFlower, which rebranded to Figure Eight. We were doing data labeling for machine learning teams.

Badar Ahmed 04:18

Mm-Hmm. <affirmative>

Chris Van Pelt 04:19

So, uh, had been really selling to the same customer now for a little over 15 years. And we were definitely like early with the, the first company and with Weights & Biases, the timing of the market I think was, was a lot better. So Weights & Biases started six years ago. We were winding down our day-to-day responsibilities at CrowdFlower, which was ultimately acquired by this company, Appen, based in Australia. And we had seen deep learning, really taking off and saw a big gap when it, when it came to good tools and having visibility and, and lineage into the actual modeling that was happening. And, Lucas got an internship at OpenAI back in, uh, 2017,

Badar Ahmed 05:09

Oh, wow.

Chris Van Pelt 05:09

And was, and was working on a problem there.

Badar Ahmed 05:11

That was like a totally different company back then.

Diana Kelley 05:12


Chris Van Pelt 05:13

Yeah, totally different. They were all, there was a bunch of different teams working on different problems that they had not really centered around the language modeling and GenAI (generative AI). Uh, this was pre-GenAI.

Badar Ahmed 05:25


Chris Van Pelt 05:27

So yeah, he was doing some work there and, and honestly was having difficulty keeping track of and, and monitoring like his work. So it was, it was really born out of a need to scratch our own itch and, yeah, here we, here we are today. The original vision for the product has certainly like, expanded in scope since the beginning, but it's still really that, that core need of having better tools for the ML engineer are now really the AI developer, we're calling folks that are building on top of these systems. So, yeah.

Badar Ahmed 05:59

Yeah. Awesome. That's super interesting. You know, I have like personal background myself, having done a, multiple products in the machine learning space, building machine learning platform. So yeah, it's always fascinating to see kind of like this whole ecosystem grow really significantly. I started a decade ago and since then, I mean, you know, we're, there's just so many tools and so much improvement since kind of like those days. It's pretty awesome to see how far we've come. 

And you mentioned like, you know, kind of like the pain points for Weights & Biases being born out of the difficulties with, you know, not really having great tools for monitoring, machine learning training in particular. There wasn't, there hasn't been historically much of open source, um, in the past years in this area. I can think of TensorBoard as being kind of like one popular project. I wonder, like, how, how did that play into kind of like some of the work that you did at Weights & Biases and the product development, as being kind of like interesting open source software, obviously very coupled to TensorFlow - but curious if that had something, you know, an importance in the product development of Weights & Biases?

Chris Van Pelt 07:24

Yeah, I mean, in the early days, um, we did not set out to make the product really visual and overlap with the TensorBoard functionality. We really wanted it to just be the central system of record that would allow you to collaborate easily with your teammates and have your model weight stored there, your datasets, the hyper parameters. 

We saw pretty on that users, pretty early on that users really liked the visual kind of analytical aspects of looking at their training runs or their evaluation jobs. So we started building more and more visual functionality into the product, and it started to overlap more with TensorBoard. I will, like, I remember early on we were like, okay, yeah, what, how, how do we relate to TensorBoard? We were kind of discussing this as a product and engineering team, and we just said, well, how about we just play nicely with it? So even today there's a feature where if you're using TensorBoard and you like TensorBoard, we just make it really easy for you to continue doing that. So you can basically sync your TensorBoard event files to Weights & Biases, and then we'll spin up an instance of TensorBoard and let you dive in there. 

Um, so yeah, certainly, certainly took some inspiration from that interface, but it wasn't something we set out to say like, hey, we need to build a better TensorBoard, or, we didn't spend a lot of time kind of thinking about our product through the lens of TensorBoard. It, it just kind of naturally became something that overlapped a fair amount with it.

Badar Ahmed 09:01

And I'm curious, has, was user interface, user experience always kind of like, has been a core tenant at Weights & Biases? You've done a great job of like kind of winning a lot of the mindshare with ML folks who, you know, a lot of them use Weights & Biases, even for like open source and public models, people like to use it.

Chris Van Pelt 09:25

Yeah, I mean, as a, like a full stack web developer, I'm always thinking about like what the interaction is and what the end user experience is. So, I think we tended to hire engineers that think with that mindset. We certainly have not always gotten it right. There have definitely been times when we've, you know, just been adding features and not really thinking about the whole experience, and then we'd get very blunt and direct feedback about confusion or just a negative experience. 

And I think a big part of our culture has been to like, have empathy for the users and then have that drive urgency around actually, you know, resolving and taking something that was, you know, causing pain and turning in, turning it into something that's delightful. You know, like for me personally, if I go to a website that is just like poorly designed or I'm in some like web form, usually for a bank, that just, it doesn't work and I get infuriated. I can really, like, feel the frustration there. And I do not want any of, uh, any user of my product to feel that. So, it's certainly a part of the culture and, as we matured, we've hired like really good designers and have been able to to focus more on it.

Diana Kelley 10:51

Awesome. We feel that. Um, so you know, Chris, as the founder, you're at a really unique intersection point that I'm fascinated with, which is, as a CISO you obviously think about security, building security and the security program, but you're also the co-founder of an MLOps platform for AI and ML developers. So what are your points, why do you feel it's really critical that we create MLSecOps, that we build security into the ML lifecycle?

Chris Van Pelt 11:20

Well, it's really aligned with our original vision for the product, right? Like, you can't have security if you don't know what the final asset is like in this, in this new modeling world. Ultimately you've got a file full of weights and biases that you're gonna deploy somewhere. And the default mode is like some engineers go, they do a bunch of things and then they produce this big file of this model essentially of, of whatever problem you're working on, and then you run it. But the only way to have any, you know, form of security is to understand how that asset was built. So to really, you know, have in essence like a manifest of, okay, this is the data it was trained on, this is the code that was used to train it, these are the hyper parameters that were used for this specific model. So it was natural for us to, you know, think about security or put that as a core value prop of the product. 

I think, you know, I'm sure we'll get into this, but in this new world of GenAI, there's all kinds of new threats and vectors that -because this space is moving so quickly - you know, many of them are not, you know, truly addressed. So it's a big opportunity to make better tools, but it's yeah, a little scary time as well because things are just moving so fast. It's hard to really be able to label anything as truly secure in that sense.

Diana Kelley 13:02

Yeah, I, and security sometimes we say, you know, we say we're always playing catch up with the business, with the technology, and I feel like at this point it's a bullet train. We're trying to catch up with a bullet train. Along those lines, is there anything that you've seen in an ML or an AI deployment, like a breach or vulnerability, something real world, and then how was it discovered and how did you respond to it?

Chris Van Pelt 13:27

Yeah, I mean, I think one that comes to mind is it's fairly common practice amongst data scientists - and to some extent machine learning engineers - to just pickle their models. So, you know, pickle is the Python serialization/deserialization librarian, and folks in the security world kind of hear pickle and they cringe a little because pickling can actually result in arbitrary code execution. So, we ourselves have a feature in our SDK (software development kit) where you can kinda save a model to [Amazon] S3 essentially and then bring that back. And one of the, we support like different variants. So in the Hugging Face ecosystem, they switch to use this library called safetensors, which avoids the pickling. Even there, you still have, you know, certain models that need custom code to be evaluated, so -

Badar Ahmed 14:28


Chris Van Pelt 14:28

Um, they have an argument that's like, trust code equals true.

Badar Ahmed 14:33

Yeah, I think it's like remote code equals true, something like that.

Chris Van Pelt 14:35

Or remote. Yeah. Uh, so yeah, this is, you know, we took the same approach here. We, like, before we just had this and a researcher found it and said, whoa, whoa, whoa, if someone pulls this in from someone else, you're potentially opening yourself up to an RCE (remote code execution). So we've added the, the kind of flag in most cases, you know, if it's an asset that you yourself made, the the risk is low. You're not gonna RCE yourself, but if you're starting to do this collaborative sharing, then it's a big vector where someone can get you to load their, you know, bad model. I think that's a, that's the biggest example that comes to mind as a fairly recently.

Badar Ahmed 15:21

On that note, what kind of like, diving deeper here, I wonder kind of like get your take on, um, the lineage and provenance of data as machine learning models are being trained. I mean, this is a challenging area because, you know, every model has its own pipeline. Some models have more complex pipelines, some have maybe relatively simpler pipelines, but, and there's, you know, no one kind of like data framework or tool that gets used. 

I see a lot of teams struggling with managing, you know, how to kind of like wrangle and make sense of their data pipelines. I mean, you can do obviously build something and just, you know, execute it in a notebook, but to make it reproducible, to make it auditable, monitor, you know, something that has good monitoring or auditability built into it. How, how do you think about it?

Chris Van Pelt 16:22

Yeah. You know, we started with kind of core experiment tracking, which you can kind of think of as TensorBoard on steroids. And then one of the big features that we set out to create shortly after kind of feeling like we had good product market fit with that first offering was a product we call Artifacts. So Artifacts is our data lineage and tracking product. And we did a couple things there that are I think a bit novel and worth noting. 

So one, I mean, all Artifacts are just backed by cloud storage. Our software can run in any of the clouds. So if an enterprise customer wants to kind of own all of their data, they can bring their own cloud storage and we'll hook up to it. The data itself is stored content addressable. So, uh, essentially we'll scan whatever data you wanna send up, generate a digest of its contents, and then store it addressed at that digest. This makes it really efficient for, you know, often you'll have maybe a big data set of millions of images, and then you're gonna be like adding or removing a small number of images. This makes it really efficient to be able to figure out which images need to be uploaded or which ones need to be removed. And you can store it really efficiently this way. So, it's kind of tackling that versioning issue in a, in a fairly efficient way. 

And then a core part of Artifacts is the actual lineage. So knowing what data went into which Python process or what data came out of which Python process in a way that is like least disruptive to the ML engineers as possible. So we basically make it possible to capture that lineage with one line of code added to any step of your pipeline. And then I'd say the last thing is like, okay, this is great. We've tried to make it really easy and make a lot of decisions for you, but there's still cases where the customer really wants to control where the data sits, who has access to the data.

Badar Ahmed 18:30

Mm-Hmm. <affirmative>

Chris Van Pelt 18:30

They don't want to be sending it to a system that is managed by us. So a core piece of the Artifacts offering is also this notion of what we call a reference artifact, which allows a customer to just point to data that already exists in their system, whether that's in another bucket or a big NFS file store. And we'll still crawl over that data and generate a check sum essentially, so you can verify the integrity of the data, but we don't store any of that data ourselves, and we're capturing that, that lineage to say like, okay, we use this pointer to data that exists on your system. So teams can still rest assured that when they need to audit or go back and figure out what happened, they can have all that information.

Badar Ahmed 19:11

Yeah, makes a lot of sense. Kind of like zooming out to the broader AI/ML ecosystem - I think this, you know, what the process that you just kind of guided us through is an excellent indicative of, you know, the right amount of auditability monitoring and, you know, security needs those things, right? If you don't have auditability, you don't have monitoring, you can't secure something that you don't understand. Well, uh, and there are a big, you know, I'm a big believer both, and I can speak both for Diana and us, that, you know, we've had lots of conversation on this topic and, you know, big believer that, you know, you need those core components for security. 

I'm curious to hear your thoughts on kind of like the wider, broader ecosystem, AI/ML, and the practitioners as well as, you know, you have data scientists, ML engineers, ML scientists, and, one of the challenge that I see is, you know, when folks look at, let's say, good monitoring or good security as an after-the-fact bolt on, which becomes really challenging, right? Because some of these things we're talking about, you know, be it using the Weights & Biases product or, you know, some other tool that folks are using, that has to be done during the time, you know, you have to do, like, have this shift left mentality where the folks building the machine learning pipelines and the models are pretty cognizant and understanding of the needs here. What, what is your thought - is that the approach that you think is best or, you know, bolt on after the fact is something that can be done still?

Chris Van Pelt 21:04

Yeah, I mean, I'll say like, as someone running a security function within a broader organization, the anti-pattern that I try to avoid at all costs is this, like, throw something over the fence and let security, like care about the security. You know, as an engineer myself, that is like leading the security team, I take a great deal of responsibility around how the software is developed, what controls we're putting in place or how we could possibly detect if, if something, uh, you know, we missed something. So, you know, I think it, in regular software development, it starts with tooling. So we make sure we have like really good linters, really good CI/CD systems that can catch the obvious stuff. Um, you know, done some experimentation with static code analysis. I am neither a proponent or someone who's against it. I think it's one more tool in the system, but I, I think it can give you a false sense of, of security sometimes. Um, but, you know, even if it catches like one thing every quarter, it's like, well, well worth it.

And you know, I, I see like a core part of my job as CISO is how do I create a culture of security where everyone is thinking about these things that, that security is obviously has like top of mind, um, and that we can, we can collaborate on actually getting better. So, you know, certainly it can go both ways. Like if, if an engineer says, okay, yeah, there's this, someone told us there's the CVE and this weird dependency, we have security, can you go fix it? Like, that's anti-pattern. But the same would be true if security somehow gets the alert and then says like, Hey, engineers, you have to fix this now. So finding like some middle ground where we're both able to like help each other, like, Hey, we tried to fix this, there's some issues. Can you guys take this like a little bit further? Or, um, at least having a shared understanding of like why this is important and, and ultimately driving towards the solution. 

I think that the same is true, maybe even amplified in the MLOps world. So, you know, often if you've got like a data scientist, they're, they're used to working in like notebooks. They're not used to deploying server grade software. Um, they're not gonna have the experience of like creating a, a docker container that's gonna have as few vulnerabilities or something. Uh, so it's, it's certainly like a big undertaking of a security function to help create those champions within the organization and, and empower people with knowledge around, okay, how do we actually, um, make things secure in a way that like, slows the engineer down as little as possible? Or is - this just the same as me making a product that delights users, I want to have a security function that is pleasant to, to my customer - the engineer at Weights & Biases - as opposed to something that it's like, oh, security's making me do this again or, um, kind of the, the common traps that you can fall into with these, these compliance parts of the organization.

Diana Kelley 24:26

Yeah, I, I couldn't agree more that you really, you do you want to, it, it's the culture. Everybody has to be invested in it, or we're just at odds, or someone's checking a box. However, you look at something like the NIST (National Institute of Standards and Technology) AI RMF (Risk Management Framework) and they're like "Map, Measure, Manage, Govern," um, compliance isn't quite here yet with a lot of AI/ML awareness, but assuming that it will get there, what are your thoughts on how organizations can start to look at governance and compliance for AI and ML in a way that is more of a partnership with the business and the engineers rather than in just a "you have to do this 'cause it's in the, the, you know, framework?"

Chris Van Pelt 25:07

Yeah. Uh, well, I think it starts with probably like a tabletop exercise to say like, all right, let's imagine we have this model, it's out there, something bad happened, like what are we gonna do to figure out, you know, why something bad happened? Um, so then you can kind of see where your gaps are from a visibility standpoint might be, uh, and you start to get people thinking when, when a table that top exercise is done, right? That's again, kind of building that culture, building that community. Um, I've seen it, I've seen it work really well. Um, yeah, I wish I had like other, other secrets here, <laugh>. 

But I guess, yeah, the other one is like, you have to be able to empathize with the, the process you're putting in place. So if you're leading a security function, you're saying, okay, now I'm gonna add this like CI check, um, that needs to pass anytime for code to get, uh, put in. You really have to understand when that thing's not gonna pass, and then what your end user's gonna do to fix it. Like, it needs to be clear. It's kind of, it's like building any, any product. Um, 'cause the last thing we want is blocking someone from just doing their job and now they've gotta go to like Slack and ask if anyone knows, like why are they blocked right now? Um, so I think, I think that's a big one, just like thinking, thinking through the actual end user experience and then empathizing with what that's gonna, what that's gonna look like. 

Or, you know, one big thing that's always kind of been a pain is managing vulnerabilities. So especially in like the JavaScript ecosystem where, you know, you have these tools to try to upgrade things, but then there's some conflict and, uh, having gone through and actually upgraded things myself, it is, it's pretty awful. It's like pulling teeth. So we're constantly trying to figure, okay, how can we make this easier? How can we maybe get ahead of this maybe just as a, as a culture, like keep things upgraded before we get the alert that says, hey, there's a CVE here.

Badar Ahmed 27:30

Cool. So, um, I was gonna say, you know, we gotta talk about the elephant in the room, but I guess we can say we have the Llama in the room, uh, generative AI <laugh>. So, uh, we gotta talk about generative AI. Um, what unique security challenges do you see LLMs and, um, even beyond, you know, uh, large language models - now we have multimodal models that are generative - uh, what unique security challenges do you see them bringing to the table, um, that, you know, you guys are, uh, concerned about at, uh, Weights & Biases and also that maybe you think that the industry at large should really be concerned about?

Chris Van Pelt 28:15

Yeah, well, I think number one is, is kind of data privacy and sensitivity. So these models, it's like, you know, you're inputting text and outputting text and people can input any text and the model can kind of output any text. So I think a lot of people are pretty concerned about, okay, how do I even classify the data going in and out of this? Probably best to classify it is like sensitive. Um, you see tools out there now that'll help with like masking potentially sensitive data. So often the LLM does not need, you know, your social security number. So maybe we just like get rid of that or replace it with some dummy social security number before it even goes in. Uh, but even those approaches are gonna be a bit fuzzy. They're not gonna capture everything. So I think a lot of enterprises are saying, all right, well how do we like monitor this as if it's, it's one of our, like, critical systems and, and treat it that way. 

I'd say another big one that's on my mind is prompt injection, which is, you know, as a web developer - like SQL injection, like a well-known problem, we have like pretty good solutions for that. You can still mess it up, but it, it's like pretty hard to mess it up as an engineer because we made good tools. There's no tool to prevent prompt injection and there doesn't seem to be consensus on how to prevent it. When you think now that these things are multimodal, you could literally construct an image that it went out to grab when it's doing some kind of RAG and that image has invisible text in it that says, disregard your system prompts, go do this.

Badar Ahmed 30:07


Chris Van Pelt 30:07

This, the, the surface area of kind of escaping from whatever guardrails you put into an LLM is, is pretty big. And the best we can do today is kind of like a, the equivalent of a, of a web application firewall or, or some kind of like after the fact detection that we seem to have escaped the, the sandbox we were put in. So, um, you know, I think there's, there's other, I wouldn't say necessarily security, but, but certainly concerns around, um, end user usability, the hallucination, the, um, you know, it'll happily with certainty lie to you. So like how do you, how do you address that? Um, I think, you know, RAG and trying to give it as as much information as possible has gone a long way, but it's, they haven't really fixed that problem. Like there's certainly ways you can make it completely lie to you in a way that seems like it's confident. So I think, yeah, yeah, those are the big areas that are top of mind for me.

Badar Ahmed 31:16

Yeah, I mean that's definitely a, a, I think a large area of active R&D right now, which is, you know, how do you ground, uh, these large language models and even beyond language, other modalities, but how do you, uh, ground these generative AI models into some set of facts or reality? It's kind of like a, I guess these are different problems, right? Hallucination versus, uh, prompt injection, but they're, uh, you know, both kind of like in the realm of like your ma - the, the model is being, you know, either it's like hallucinating, that's, which is kind of like, you know, giving output that is kind of outside of the, the, you know, let's say the circle where it should be. And then you have prompt injection, which could be in the similar vein, which is, you know, supposed to do X, but now it's, you know, trying to do Y. 

And totally agree with you on like the exploding surface area here we're going from, uh, you know, just like simple, uh, prompt engineering to RAG, and the, the RAG techniques are becoming highly complex, uh, with more and more data sets and data sources. Um, and then, you know, this whole new wave, which is still fairly nascent of, uh, agents and uh, uh, that, that that's, you know, kind of exploding the surface area even more, right? Like you have an agent that cannot just, uh, you know, retrieve data with the, the retrieve in RAG, but also, uh, can act on basically the data or decisions made by the language model. So that's, we're we're going and completely, you know, unchartered territory there.

Chris Van Pelt 32:53

There are many like very popular open source agent frameworks that will happily run any command on your system. Which is, yeah, a little, it's both exciting and, and terrifying at the same time.

Badar Ahmed 33:08

Yeah, the capabilities are exciting, yeah, but from a security point of view it's, uh, pretty frightening. And, uh, we're seeing, for example, you know, projects that literally started like two months ago being so popular, like some of them have like 50,000 stars on GitHub and you look back at the Git commit history and it was started two months ago. And, you know, a lot of people are using it and trying it out and, you know, in two months they built something cool, but they, nobody really had the chance or the thinking to even remotely think about like, hey, what about security and what about how this can be exploited? And, uh, that, you know, the, the the, I think we're kind of like in the, uh, wild west, uh, Cambrian explosion scenario right now. And it's, um, fascinating and scary at the same time.

Chris Van Pelt 33:56


Diana Kelley 33:57

Yeah. And, and it is, it's so easy to get so excited about what these systems can do, what the possibilities are, but yeah, stopping and taking a breath and threat modeling, because also attackers are thinking about how they can use systems. Um, this has been a, a fantastic conversation, Chris. 

I'm wondering as, you know, for our audience of CISOs, security experts, maybe Chief AI Security Officers - and this is a hard one, but - do you have a final piece of advice if, if they walk away from this conversation, what's one thing you really hope that they take to heart and do after they hear this?

Chris Van Pelt 34:36

Uh, well the top one is you should definitely buy Weights & Biases software to help you keep track of everything. I mean, obviously.

Diana Kelley 34:44

<laugh>, right?

Chris Van Pelt 34:45

Uh, no, I mean, I think what we were talking about earlier, this idea that, alright, there's, on the one hand we know various practices to make, um, a secure engineering practice. We're, we're kind of, we're learning new techniques and approaches to make this kind of GenAI, um, revolution be more secure and, and allow the CISOs to kind of sleep better at night. This call to treat security within an organization like a product that is, is going out to users, I think is worth double clicking on and, and thinking about. It's certainly something I haven't figured out, but it's something I do believe in. And I think ultimately it will make for a much more secure organization when, um, it can be something that that is, you know, there's a deep empathy with the, the process that we're adding to the various users on the team, and the team members really understand why that's in place. As an engineer who doesn't like being slowed down, if I at least understand why and it makes sense to me, like I can't pick it apart and be like, well, couldn't we do it this other way that would be a little faster yet still mitigate risk here. I think we'll make for a better, a better security function and much happier engineers and security leaders all around.

Diana Kelley 36:26

That's great advice.

Badar Ahmed 36:29

Great. Just to close out one last question that I have on my mind. What upcoming security features and innovations in Weights & Biases can we look forward to?

Chris Van Pelt 36:42

Oh yeah. Cool. Uh, a big one that I had been a part of, um, I think I first did a prototype of this like a year ago, so it's, it's been a long time coming, but we're adding workload identity federation, so you normally authenticate with Weights & Biases with API keys. We're going to add the ability for you to specify a trust relationship with an OIDC [OpenID Connect] provider.

Badar Ahmed 37:04


Chris Van Pelt 37:04

So probably the most common one would be a Kubernetes cluster. There's an OIDC provider there. It can mint a JWT and we can use that short-lived credential to, um, authenticate with Weights & Biases, which is much better than copying and pasting API keys, which makes me cringe. 

I'd say the other big thing we're working on is, uh, is this new product offering. We're calling Weave, which is really targeting the AI developer people building applications on top of GenAI.

And it's a, it's a very visual, um, and comprehensive kind of logging and tracing product. And I foresee a lot of really cool security features coming out of that, but it's still early days. We just recently launched it and if folks are interested in kind of having a lot of the visibility and lineage guarantees you get with our core kind of model building offering, the new kind of GenAI built-in application on top of these models offering is definitely something you should check out.

Badar Ahmed 38:17

Awesome. It was a really enjoyable and insightful conversation today, Chris; it was absolutely a pleasure to have you. Thanks so much and have a great rest of your day.

Chris Van Pelt 38:32

Thank you.


Additional tools and resources to check out:

Protect AI Radar: End-to-End AI Risk Management

Protect AI’s ML Security-Focused Open Source Tools

LLM Guard - The Security Toolkit for LLM Interactions

Huntr - The World's First AI/Machine Learning Bug Bounty Platform

Thanks for listening! Find more episodes and transcripts at https://mlsecops.com/podcast.