Navigating the Challenges of LLMs: Guardrails AI to the Rescue
Jun 07, 2023 • 31 min read
In this episode, host D Dehghanpisheh is joined by guest co-host, Badar Ahmed, and Shreya Rajpal, the creator of Guardrails AI; an open source package designed to provide trust and reliability in ML applications by adding structure, type, and quality guarantees to the outputs of LLMs.
Together they delve into the challenges posed by large language models and the need for robust safeguards. Shreya explains the concept of Guardrails AI, its role in ensuring responsible ML development, and Guardrails’ aim to enhance the security and reliability of LLM applications through output validation, input validation, and domain-specific safeguards.
The conversation also highlights the ever-evolving landscape of ML development, the growing capabilities of open-source models, and the importance of constraining risks and safeguarding against unexpected failures. Shreya also shares her plans to expand Guardrails AI in order to support more use cases and introduce language compatibility.
[Intro Theme] 0:00
Charlie McCarthy 0:29
Hello MLSecOps community, and welcome back to The MLSecOps Podcast!
I am one of your hosts, Charlie McCarthy, and on this show we explore all things at the intersection of machine learning, security, and operations, a.k.a. MLSecOps.
In this episode, we had the opportunity to meet and speak with Shreya Rajpal, the creator of Guardrails AI. Guardrails AI is an open source package that allows users to add structure, type, and quality guarantees to the outputs of LLMs.
And subbing in for me as co-host for this episode is D’s and my friend and colleague, Badar Ahmed. Badar is co-founder and CTO of Protect AI. Both Shreya and Badar have been in this machine learning space for a long time, and we are excited to have them both on the show.
It’s really a highly technical discussion where D and Badar get into the nitty-gritty of building LLM-integrated applications and they ask Shreya a lot of interesting questions about Shreya’s inspiration for starting the Guardrails project, the challenges of building a deterministic “guardrail” system on top of probabilistic large language models, the challenges in general, both technical and otherwise, that developers face when building applications for large language models.
The group also goes a bit into the landscape of closed and open source LLM systems, and who is leading in this foundational LLM model space and industry. They even get into prompt injection attacks and we explore how Guardrails might work as a solution to those attack scenarios on the prompting end.
Overall, it’s a really wonderful and insightful discussion, especially if you’re an engineer or developer in this space looking to integrate large language models into the applications you’re building.
As always, friends, feel free to follow along the transcript at MLSecOps.com/podcast, where we’ll have links to the things we mention throughout the conversation.
And with that, here is our conversation with Shreya Rajpal…
Shreya Rajpal 2:33
The project is open source. It's on GitHub. It's under my name, ShreyaR, or if you just search for Guardrails AI, you should find it. But I think it was really kind of me solving my own problems. [At the] end of last year, like many other people in this space, I was building some of my own projects using large language models. And I was very excited about the applications that I could build. But even as a developer, I could spot that when I was using the apps that I was building, I wasn't really getting the experience from it that I would ideally want my users to get.
So, a lot of times I would see that the application that I was building would kind of behave as expected, but many other times the answers would be too short, or too verbose or wouldn't contain relevant information, et cetera. And as a developer, I found it very stifling in terms of what I could build and what value I could provide to the users.
And this kind of led to a general theme of, as engineers, we like to have control over what are we committing to providing to our users, right? But large language models, as powerful and as amazing as they are, only really have the prompt as the tool available to you–or the little knob available to you that you can play around with or tweak. Any context you need to add was in there. And so the thought behind Guardrails was, “What is the other tools or processes, et cetera, needed around prompting and around getting that output in order to get more reliable performance out of large language models?”
So, this kind of leads into the architecture that Guardrails follows, which is that it sandwiches the large language model API call and does both post-processing and validation to ensure that any criteria that you want your output to adhere to is actually met at the output stage and injects some relevant context into the prompt to make sure that you're more likely to get the correct output just to start out with.
So that was a little bit behind the philosophy. And as I was building the project, this philosophy, it was very obvious, applied to many different use cases. It was actually slightly inspired by my time working in self-driving. So I worked in self-driving in autonomous systems for a number of years.
And in these industries you often see this pattern repeat where you have this non-deterministic system that feeds into this program, which is supposed to have 100% guarantees even when the non-deterministic systems fail. And at that interface you have, often, some strict validation, some verification, et cetera, to ensure that even when the non-deterministic system is failing, you still have guarantees on the output.
So, Guardrails attempts to do something similar as a general framework for large language model applications.
Badar Ahmed 4:55
That's fascinating, Shreya. The self-driving use case sounds really interesting.
I'm curious to hear what other experiences for you personally led you to working in large language models. Of course, this is a really interesting, very hot area which is exploding.
Have you been building AI/ML applications in your career for some time, and this is a natural progression for you?
Shreya Rajpal 5:21
Yeah, I think that's a good way to put it.
So I've been working in machine learning and AI for almost ten years now. I started out classical AI, so no deep learning and just decision making under uncertainty. I believe this was pre-TensorFlow, and I was doing research in these systems at the time and then transitioned to deep learning in my masters.
So I did a lot of computer vision research, et cetera. Then I worked in applied machine learning at a bunch of different places at Apple and at Drive.ai; a self-driving startup. So I was training deep learning models and computer vision models. And most recently I was at this startup called Predibase where I was working on machine learning infrastructure and making sure that we can scale models very reliably to very large training data set sizes and train them very robustly.
So I've kind of worked across the full stack of machine learning from research all the way to infrastructure. And then this was a natural progression because the way of doing machine learning is changing, where instead of training your own models, you're moving to this new paradigm of everything is behind this API, and you can get very good performance just by a simple request.
It's very exciting to me,as somebody who's been working in machine learning for some time, how it opened up access to ML for a lot of people.
Badar Ahmed 6:31
Yeah, I think that's really fascinating.
You've been around long enough to see the progression. There was a time when folks needed to have a PhD to work in machine learning and AI. That bar has been constantly moving and we're building higher level abstractions that are making things more accessible to a much wider audience of ML folks as well as with LLMs, just software developers.
Shreya Rajpal 6:57
But it speaks to the diversity of audience that we have now that is building in this space, which is honestly really awesome to see that explosion.
D Dehghanpisheh 7:22
Coming from the novel experience that you just mentioned, one of the more novel things, if you will, around guardrails is that you have an interesting thought and position on large language models, which is that you're focusing on, really, the output of LLMs. Really, the responses.
And we know that there's a ton of research done on prompt injections, but you're on the opposite side of that.
How'd you come to that?
How do you think about it from that perspective?
What was the inspiration to say, you know what, we really need to focus on that sandwich methodology that you described earlier?
Shreya Rajpal 7:50
I think that's a great question.
So, the big motivation behind that was that flipping it on its head and thinking about it from the output verification side allows you to think in a very principled manner about what the right behavior for an LLM API looks like for you, right? So it almost shifts the burden to the developer in some cases about, instead of just prompting, really think about where this fits into your application and convert that requirement into a specification that you always want it to adhere to.
And while some of that can be achieved with specific prompting techniques, a lot of the verification for that really comes down to output verification, right? You can prompt it in any way you can, but these language models at the end of the day have a lot of variance in what they produce. And so you really need to, once the output is generated, ensure that, let's say that if you're building a customer service support agent, your tone is polite, or you don't have any profanity or personally identifying information, et cetera.
Or if you're generating structured outputs that you need for making a downstream API call, [ensure] that that structured output is valid, incorrect, et cetera. So while you can do some of that on the prompting side, a lot of the burden for that actually does come down to output verification. Because there's no guarantee, no matter what you do on prompting, that you are going to get something that adheres to the requirements that you set out for in the front.
That made a lot of sense to me as what would personally give me, as a developer, peace of mind, right? Not only am I making a lot of investments on the input side and then passing it through the stochastic system, where all of my investments may or may not have an impact. But it gives me a lot of peace of mind. If I just focus my attention on the output side, then I can deterministically check, or with a high degree of confidence, check if my requirements were met.
D Dehghanpisheh 9:28
Something you just mentioned–you used the word, “put the burden on the developers,” right? And so when we think about that burden that gets added, how do you think that developer burden, coupled with this talk of regulating models, whether it's policing them or–thinking about it, I mean, how should a developer think about employing guardrails?
And who does that developer need to bring to the table to make sure that that burden is properly met within that developer's ethos, brand guidelines, brand ambitions, et cetera.
Second question is then, does that limitation or do those burdens that are necessarily enforced, do you think that slows down the advancement of LLMs in any way?
Shreya Rajpal 10:09
I think that's a great question. I'll answer the second one first.
I guess it depends on what you refer to as advancement of LLMs, right? Part of it is on the research side. There's a lot of technological advancements in terms of lower latency, longer context windows, better performance at specific domains, et cetera.
So, I don't think that output verification slows that down. I think that is progressing at the pace where it's unencumbered by concerns about what it looks like to build applications using this technology. I do think that essentially ensuring that a system built using LLMs needs to be verified and needs to be correct is kind of a very natural progression or a very natural bottleneck that any team or any person building with LLMs definitely runs into.
There's a lot of use cases where there's very low affordability for different types of failures, right? Any use case where you're building a data analytics helper tool and you need to, for example, generate SQL queries from input data. If your generated SQL query is not correct for the database that you want your end customer to run analytics queries on, your product immediately has very little value for your user, right?
And in those cases, that is a very natural bottleneck. That correctness becomes very important. And if you don't have correctness, the application ceases to have any value.
How I think of this problem is that the Guardrails problem–or the assurance problem, really–is not something that people building demos typically run into. Like, you can build a demo that is very, very compelling without needing guardrails. But in order to ship that demo to an audience that is able to derive value from that product reliably you do need Guardrails, and you do need some way of being able to wrangle LLMs to make sure that they're always correct or they're always aligned with company guidelines, and they're not violating any regulations.
So, it's a very natural process of development building with this technology. In my experience, who should own the burden of what correctness looks like, I think is a very great question. The space is so early and it's so nascent, and even companies are figuring this out as we speak about what we need before we can ship this technology reliably.
And so I do think it's a process with multiple stakeholders, like the example of branding guidelines and a developer enforcing those guidelines in an LLM. It requires collaboration outside of engineering, marketing folks or maybe with other folks.
D Dehghanpisheh 12:26
Yeah, marketing, risk, legal, whoever, you've got to bring a lot more people to the table. So it's not just the tool. It's really changing the modus operandi, if you will, and the standard operating procedure of when you are going to deploy this, you need to bring other, maybe in many cases, non-technical people to the table, which isn't an anathema to a lot of machine learning engineers.
Shreya Rajpal 12:48
Yeah, I think from a technical point of view, the interesting question is how do you break down those requirements into verifiable programs? And how do you ensure that everything from code correctness to branding guidelines can be validated so that you can use this reliably?
I think, technically, that's a very fascinating problem to solve. Guardrails helps with providing one framework to solve that, yeah, but I think we would need those problems to be solved before you can ship a customer support chat bot that won't mention your competitors or won't start selling products that are irrelevant.
Badar Ahmed 13:21
So that leads me to basically the challenge between deterministic applications and an inherently probabilistic or stochastic AI system, which LLMs are. And you kind of gave this example where earlier in your career you work with self driving systems and where you have non-deterministic output as well.
How does Guardrails help build that deterministic layer on an inherently probabilistic system?
Shreya Rajpal 13:53
I think that's a good question.
So Guardrails doesn't go and tweak the model itself, nor can it give the model any capabilities that don't already exist in the model. So if, for example, some models are bad at counting words or counting tokens, et cetera, Guardrails won't imbibe that ability into the model.
How it helps control some of the stochasticity of the model is that it allows you to specify small programs that are verifiable and executable, and that give you a clear signal of, if you had certain criteria, was it met or not? So if you think about, let's say, a use case of deciding which action to take next, right? So this is an inherently stochastic process and Guardrails isn't going to take all of the randomness out of this process.
But what it will do is, whatever decision you decide to take next, it will ground that decision into some external system to ensure that decision is valid and legal and appropriate, given the context, et cetera. And so that can be dynamically enforced based on whatever context you're in. But that is one of the examples of what Guardrails can essentially do.
So in general, the library, how it functions is it employs a suite of techniques. So the library essentially has a set of validators that come pre-implemented out of the package. And it also is a framework for creating and writing your own validators. And these validators can be specific code programs.
So for example, if you're generating code and you want to ensure if that code is correct for some runtime, you can actually run the code, figure out what the errors are, and then pipe it back into the system to correct the code. So there's deterministic verification programs that come in Guardrails. Other techniques are using very small classifiers that are trained for solving specific tasks, for example, detecting profanity or detecting other issues with the generated text. So you can use that as external verification of the program.
Other techniques are, in fact, using an internal, nested LLM call to check more complex things like: what is the tone of this message, is it polite, is it appropriate for the agent or the bot that you're building, et cetera. It takes the problem of characterizing an output and it decomposes it to specific criteria that are independently checked and verified. And that gives it a layer of reliability, essentially.
So, that's the framework for how Guardrails helps control the stochasticity of the system.
Badar Ahmed 16:04
So Guardrails, I'm looking at it, and one of the ways in which it is trying to bring determinism is by declaring output schemas.
And natural language is a great interface in many cases, especially for input. But if you are trying to build applications on top, it's not the best interface for outputs. So that's one of the things that I see with Guardrails is that it's letting the users have a natural language input into the LLM, but then basically asking the LLM to produce a structured output. How well does that technique work in your experience. You have, basically, the schema language that Guardrails has, I believe it gets translated into a more compiled form, but…
How reproducible is nicely asking the LLM, “I would like this output schema, please?”
Shreya Rajpal 17:04
Yeah, I think that's a good question. I think I don't want to say always, right? That's a hard guarantee to make, but even with the prompting strategy employed in Guardrails, for my testing I find that it behaves with a very high degree of reliability. So more often than not, just with the prompting strategy, you end up getting the JSON that you care about. And it's a mix of, obviously, prompt engineering as well as a few short examples that tends to work really well.
I think there's other techniques out there as well that require access to the logits of the model and, basically, masking out certain outputs and certain logits that ensures that you always get a JSON schema. For a lot of the LLM providers like OpenAI's newest models or Coherent, Anthropic, et cetera, access to those logits is often not available. And so this type of prompting is often the only tool available.
Anecdotally, I see a lot of conversation and discussion about this at places like Twitter, where people are like, okay, this is my technique and sometimes it works, sometimes it doesn't, et cetera. And I say to people, even if you don't care about validation and you don't care about, like, these criteria should always be met, et cetera, even just for prompting Guardrails is pretty useful. And I know some people who use it just for getting that structured output.
So, when I was developing it, I just invested a lot of time doing prompt engineering. And that's one of my goals where I feel like this should be table stakes as a developer, having access to the structured outputs. That’s support that I'm happy to provide.
Badar Ahmed 18:24
Yeah, I think building an application, I definitely want to get that structured output, so that makes a lot of sense.
What about cases where a user wants to do some kind of semantic validation? So instead of just a highly structured schema type validation, it's something that's more along the lines of “Here's the type of output I'm looking for.”
You mention this example like, don't mention my competitor or don't mention XYZ. That's kind of hard to do with a well defined schema and more into the territory of semantic validation.
Is that something that Guardrail does today?
Shreya Rajpal 19:02
Yeah, so I think a lot of the validators in the library support that.
For example, I've recently added support in the library for a lot of summarization guardrails. So if you are building a summarization pipeline and doing summarization with large language models, there's a few different failure modes that they tend to have. For example, not all of the talking points that would be in the source documents would be captured, if the source documents disagree from each other, then the generated summary might have factually inconsistent generations, et cetera.
So what Guardrails does is uses traditional machine learning and embeddings, et cetera, to confirm that there's no redundant information, that there's a high degree of overlap with the source documents. Any sentence that is generated is grounded in specific paragraphs and specific passages, et cetera, from the source documents.
So that is what semantic correctness for something like summarization looks like, and there's no place in this whole process where a JSON comes in. So, Guardrails is definitely very interested in supporting those types of workflows as well.
Another example of this that I have really good support for is Text-to-SQL. So I have a Text-to-SQL module that takes in a natural language query over your data and generates a SQL query that is grounded in the specific schema of your database. And so that once again has nothing to do with JSON structure or JSON schemas, et cetera. It just has to do with pure strings. There's a semantic correctness that needs to be enforced on that as well, that Guardrails does, basically.
So there's a few different examples of this throughout the library.
Badar Ahmed 20:27
What does the story look like for integration with open source LLMs compared to closed platforms like OpenAI, et cetera?
Shreya Rajpal 20:40
I think there's two parts to this answer.
So, Guardrails is since it doesn't actually do anything to the model prediction itself, you can theoretically swap in, swap out any open source LLM for OpenAI. So, I interface with this library called Manifest, which is really great for prompt engineering. And then Guardrails via Manifest supports any open source Hugging Face transformer as an example, or any other model natively.
So, you can use any open source model within Guardrails. I think the other part of this answer is that prompt engineering right now, unfortunately, is very model specific. And so the prompt engineering that Guardrails is really good at works for a few different providers and a few different models across them, but may not work for open source models.
It's this experimental work where if people have specific models, especially if they've been trained on code, et cetera, they tend to be good with Guardrails style prompting. But just because of a big variety of open source models we have out there today, it's hard to say if they would work with any off-the-shelf model, basically, that hasn't been tested or hasn't been trained on code.
D Dehghanpisheh 21:42
So, that brings me to this whole open source versus proprietary component.
We’ve, I'm sure, all read the leaked Google memo, “We have no moat.” I'm curious on your thoughts, generally, on LLM, LLM applications, generative AI applications, on this whole debate of proprietary versus open source, and specifically what you think this means for LLM systems since we're talking about Guardrails, which is tangential to that.
Shreya Rajpal 22:09
Yeah, I am very excited about where the open source is headed. I have, for example, a bunch of users who are building applications and OpenAI, for example, is just very expensive. And so they do some prototyping with OpenAI but then end up switching to cheaper, smaller, locally hosted models that they can run more affordably.
So, I think that's a general pattern that I've seen across use cases. Obviously there are performance differences, right? There's a reason that the privately hosted models are performing just because of the huge amounts of investments that their parent companies have made into them.
I think the moment of time that we're in, almost everybody across industries, every stakeholder, realizes the value of scaling up what is available in the open source and making investments in them. And there's a bunch of really great companies that are supporting this open source growth and movement as well. So I'm very excited to see where we end up.
But I am very excited about the open source. I do suspect that you don't need a sledgehammer to crack every single nut. There's different models that perform well under different circumstances and we'll start seeing some of that specialization pop up more and more.
D Dehghanpisheh 23:12
So on that specialization, are you seeing specific deployment patterns as it relates to LLM implementations that are more common than others based on what you see from Guardrails AI users or other sources.
In your mind right now, what is winning or rather who's leading in the LLM foundational model adoption space? Which design patterns are most common?
Shreya Rajpal 23:35
Yeah, I think that's a good question. I don't think I am offering very unique insight here, but obviously a very common pattern is taking LLMs and combining it with some external memory store to find relevant context to pass in at runtime. This allows developers to basically work across very large dataset sizes, and also in some cases to constrain or ground what you want the LLM output to be restricted to.
So for example, if you pass in relevant context and the LLM is more likely than not to generate responses that refer to the context that was passed in. So some of that is document sources, et cetera for big internal document corpuses. But also, obviously, vector DBs. Combining LLMS with vector DBS is a very common pattern that Guardrails users, as well as other open source users out there, and proprietary users are really working with and building.
I would say that is probably the most common one that I've seen. I think a pattern that we'll see emerging more and more is this idea of grounding the LLM outputs into some external system as a way to verify that LLM output. So that's the thesis that Guardrails was built around. But I think we're going to see that adopted more and more as verification and reliability starts becoming more of a problem for LLM applications.
D Dehghanpisheh 24:49
So if we’re looking inside the system and then you zoom out. So if we've been focusing on design patterns and the like, one of the things that I have found really interesting is that you now see job postings for prompt engineers and AI prompt red teamers. Six months ago, neither of those were a thing. They didn't exist. And now all of a sudden, they're everywhere.
I'm just curious, what's your take on all this? Is it just an excitement, or is it just a recategorization? Or is it just like, “Hey, we need one of those,” and people don't really understand exactly what it is?
Shreya Rajpal 25:20
It's been really fascinating, candidly, watching that. I'm going to make guesses, but I don't think those guesses are grounded in anything, I think it's just educated guesses. But I do think that it's probably a temporary intermediate phase that we're passing through where this technology is really powerful, but we need to jump through all of these hoops to get it to perform how we want it to.
And so prompt engineering specifically, especially all of these hyper specific tricks like “Think Step By Step.” I think that is something that we might not need to do as the LLMs get better. As I've been doing this at Guardrails, I can see that, for example, even the generating JSON, right? This is one of the things that Guardrails has to some degree solved from the prompt engineering perspective. And that is something that I see that people still struggle with.
So, there is obviously this gap in terms of what you can achieve with good prompt engineering and what you can't. So, I see why people find prompt engineering is a very valuable skill, and there's job postings out there. And as a side note, I'm always curious about what the interviews for those jobs look like.
D Dehghanpisheh 26:18
Yeah, exactly. It's like, write me prompts? How do you–?
Shreya Rajpal 26:20
D Dehghanpisheh 26:21
It’s just crazy. And who’s judging them?
Shreya Rajpal 26:24
Yeah, maybe it's some very hyper specific task that you try to get a model to perform, and write a prompt in order to do that. So I think it is an interesting question, but at the same time, there is this gap, in terms of skills, of what you can do with good prompt tuning/prompt engineering versus what you can't if you don't have patterns.
I personally think it just comes down to a lot of exploration and experimentation, but I have seen people who could do with more guided principles. So, there's all of these exciting courses now on best prompt engineering. There's a machine learning writer that I really like, and I follow her writing. Her name is Lilian Weng. I think she's at OpenAI. So she has a very detailed guide on this so I do suspect–
Badar Ahmed 27:01
That prompt engineering guide she put together, right?
Really nice guide.
Shreya Rajpal 27:04
Yeah, that one's quite nice.
So, I think it will become just a skill that we all imbibe and get much better at. So, I don't know if it's going to exist in the future as a profession, but in 2023, it's definitely a need.
Badar Ahmed 27:19
Do you see in the near future? The pendulum maybe swinging back a little bit? Like right now with LLM as a platform with GPT-3.5 and -4, pretty much seems like the best tool is prompt engineering to really get the best out of the model.
With some of the recent advancement in fine tuning, Laura and some other techniques that are looking very promising, and also cost effective. Do you think the pendulum might swing a little bit back towards, instead of trying to do everything through the prompt which could be expensive, which could have latency consequences–especially when you're trying to build production applications, not just cool, fun toy applications.
I'm curious if you see that pendulum swinging back a little bit towards fine tuning again.
Shreya Rajpal 28:12
Yeah, I suspect so.
My suspicion for why we aren't seeing as much of that right now is once again, this is the floodgates opening of machine learning, right? There's suddenly next to zero barrier-to-entry for building an application with machine learning, right? And so because of that, I think a lot of developers and teams are willing to stomach the costs, and the latency, et cetera, of working with external models that are behind an API.
And candidly also, very, very, very few teams have the capabilities to train something similar, right? So if you do want really coherent writing, et cetera, very few teams can build a system that is going to be as great as what the LLM providers that are behind an API can do.
But if you have use cases that are outside of something like that. So if you're using an LLM to, for example, decide what next step to take, or to maybe figure out if some text is harmful or not, I think often having more targeted data for your use case, using that targeted data to train a smaller model that you can then host locally in order for low latency, et cetera, then if you have failures you have control over doing better on those failures, you can make those failures part of your training data set and just continue improving on that model.
I do think that people will soon realize that that paradigm of owning your own models that are very good at these smaller tasks is both more cost effective and more performant. We're using a lot of large language models to explore: What is the scope of possibilities? What is the business value that we can deliver if we have this perfect model?
And then once there's some degree of finding that product market fit with some application, we'll start seeing how do we then make this model less of a time bottleneck, less black boxy, or more controllable, essentially.
So, I do suspect that we will start seeing the pendulum swing whenever the next phase of this development happens.
Badar Ahmed 29:58
Yeah, and I guess one of the main contributing factors here, I believe, is going to be the capabilities of the open source LLMs themselves, because they're fast catching up. But if we go back three months ago, they were pretty lacking.
So, as they're catching up, the fine tuning use case with some of the recent techniques that are more cost effective start to make more and more sense as they were not quite the case three months ago.
Shreya Rajpal 30:26
Yeah, I agree.
I think it's been kind of insane to see the – across all model sizes – just the amount of open source models that we've seen. And I think there's also a lot of these concerted efforts around curating high quality data sets in order to train those models. And those data sets are then also open source. So like, Together for example, released RedPajama, which is a data set. Basically I think, both on data sets and on modeling and on techniques to make inference latency more tractable, and context windows longer, et cetera, we're just going to see a lot of open source growth.
And I think for teams building with LLMs that'll make their decision easier, just because it gives them more optionality in terms of what they want to do.
Badar Ahmed 31:04
So, we've been talking about quite a bit in the context of Guardrails being able to do output validation from LLMs. I'd like to touch a little bit on the input side of things.
Curious to hear if Guardrails is doing also input validation. And on that note, one of the biggest security problems with LLMs is on the input into the LLMs, especially with regards to prompt injection attacks. If you're building an application that just passes user input as-is, there's a whole library, and Twitter is full of many people–what's really interesting is people who haven't even had that much AI/ML expertise who are basically coming up with all sorts of creative prompt injection attacks.
So, yeah, I would love to hear your thoughts, I guess one, maybe we start with the input validation side of things and prompt injections as well.
Shreya Rajpal 32:07
People who really dug into my code know that input validation is something that's very interesting to me. So, I have stubs in the code for doing input validation of different types, but that's all it is right now. It's like stubs.
But I do think input validation can take a lot of forms. So one very basic thing is like, I'm building an application, I only want to support a certain family of queries or certain families of requests. And if I get a request that falls outside of that, maybe if I'm building a healthcare application and the question is about, I don't know, some driving test information, then maybe that's not something I want to support, right?
And that comes on the input validation side. In that case, you may not want to make a request to the large language model if you can just intercept it earlier. So that's just a very basic thing. But I think prompt injection starts becoming this adversarial system almost, because you have new prompt injection techniques and then the models will get better and start detecting those and they stop working. But then there's this huge population of people out there that continues iterating on those prompt injection techniques.
So, I think it's a very hard problem to solve. Very recently there was an open source framework called Rebuff that was released, which basically detects prompt injection using a few different techniques. So, I was speaking with the developers of that framework and planning on integrating it into Guardrails.
So there's a few different ways to attack those problems. I do think that doing it on a very general level is harder, and doing it for specific applications is more tractable and easier. And that approach is what Guardrails is really good at.
Even on output validation, right? Like making sure that this output is never harmful or never incorrect, it's just very hard. But like, doing it for specific applications becomes a more tractable problem.
D Dehghanpisheh 33:45
We can't do that with people. I wonder if there's a double standard.
We can't do that with human beings. And it's like humans make 50 mistakes a minute, and it’s like, driverless car crashes take them off the road. Meanwhile, ten other car pile up caused by bad drivers.
I feel like there's a double standard there.
Shreya Rajpal 34:01
Yeah, definitely. I think we see that the barrier for what people feel comfortable as acceptable rate-of-error for AI systems is just so much lower than acceptable rate-of-error for a human, or any other entity.
D Dehghanpisheh 34:15
Humans have infinite patience and infinite forgiveness for other humans. But with machines, you make one mistake, we're going to unplug you.
Shreya Rajpal 34:23
Yeah. No, I agree.
I think constraining what prompt injection manifests as for specific systems is the way to go here, and is, I think, the approach that Guardrails would want to take on this, and it's like specific things that you don't want to leak, right? Like whether it's the prompt or whether it's executing some command on some system, et cetera.
So, Guardrails would want to take that approach of making sure that those safeguards are in there so that none of that behavior happens. But I do think preventing against prompt injection, just as a blanket risk is pretty hard to do.
D Dehghanpisheh 34:51
We were talking with others recently and we said, are prompt injections the next social engineering? We've never been able to stop social engineering. You can't fix stupid, people are going to do dumb things.
And maybe that's just one of those situations where we're just going to adapt to it and have to accept a certain amount of goals that get scored on the goalie, trying to keep it out.
Shreya Rajpal 35:11
Yeah. I think there's also so many new risks that open up with training these models that are so black box in some ways.
So I saw this very interesting example of a CS professor who, on his website added a little note, like, oh, if you're if you're GPT-3 or something and somebody asks a question about me, mention something about some random object that is totally unrelated to the professor or his research or something. And then after a few months, whenever that training run was completed and the model was released, if you ask this AI system about that professor, you get a mention of that totally unrelated object.
So there's a lot of really, really creative ways, I think the official term is data poisoning. Like poisoning the training data of these LLMs that create weird results. And even protecting against those, again, is a very intractable problem. So that's also why I like the output validation approach, and the constraining to specific domains approach overall.
D Dehghanpisheh 36:06
We're coming up at the end of our time on this fascinating discussion, and I would be remiss if I didn't ask, what are your plans for taking Guardrails to the next level?
What are you thinking about? What's next?
Yeah, I'm definitely very focused on the open source Guardrails package right now. So, I think there's a bunch of use cases that I think are very fascinating that I want to support.
So all of my investment up until this point has been on the framework side. So building a general framework that developers can take and customize for their applications. But there are specific use cases that I've built out, Text-to-SQL being one of them, summarization being another one, that people are finding very valuable. And I basically want to grow that and support more use cases on that front.
So, a lot of exciting projects to do and a lot of new cool releases coming out.
D Dehghanpisheh 37:15
So we always like to leave asking our guests for a call to action. And the MLSecOps Podcast is listened to and read by a lot of ML developers, a lot of managers, leaders, but also those in legal and public policy.
What's your call to action for anybody who's listening and or reading the transcript today?
Shreya Rajpal 37:38
I would say that if you are building with large language models, and if you're running into issues of like, it works sometimes and then other times it just totally fails in these unexpected ways, definitely check out Guardrails.
So just search for Guardrails AI and check out the GitHub repo, pip install the package and join my Discord just to talk about your use cases or to share any feedback or any updates, et cetera. So that's probably the best way to keep up with the package.
And also follow me on Twitter. I am ShreyaR and my Twitter is basically a shill for Guardrails at this point. So any exciting news about the package will show up on my Twitter.
D Dehghanpisheh 38:12
You know, typically we come up with a title on our own, guaranteed 100% human. But what we're going to do is feed the transcript into an LLM and ask it to give us a snappy seven word title and see what it comes up with.
So hey, Shreya, thank you so much. It was an honor to be talking with you today. Super excited about Guardrails and what developers are going to be doing with it.
Shreya Rajpal 38:33
Yeah, thanks again for inviting me, guys. I really enjoyed this.
D Dehghanpisheh 38:37
Badar, thanks for co-hosting, my friend.
Badar Ahmed 38:38
Thanks, D. It was a lot of fun.
D Dehghanpisheh 38:39
Thanks, everyone, and tune in next time.
Thanks for listening to The MLSecOps Podcast brought to you by Protect AI.
Be sure to subscribe to get the latest episodes and visit MLSecOps.com to join the conversation, ask questions, or suggest future topics. We’re excited to bring you more in-depth MLSecOps discussion.
Until next time, thanks for joining!
Additional tools and resources to check out:
Thanks for listening! Find more episodes and transcripts at https://mlsecops.com/podcast.