Chapter 7. Trust No One

Before the recent obsession with Netflix’s Stranger Things TV show, the 1990s had The X-Files—one of my all-time favorite shows. It was about two FBI agents investigating strange phenomena like monsters, aliens, and government conspiracies. The show’s protagonist, Fox Mulder, had two catchphrases. One of those phrases was hopeful: The truth is out there. The other was deeply paranoid: Trust no one.

In this chapter, we’ll focus on the second phrase. We’ll briefly review the myriad risks inherent in typical LLM architectures and note that while it’s worthwhile to implement the mitigations discussed previously, there’s just no way to assume your model’s output is always trustworthy. We will adopt Mulder’s “Trust no one” mantra and explore how you can apply a zero trust approach to your LLM application. Paranoia isn’t insanity when the threat is real!

Zero trust isn’t just a buzzword; it’s a rigorous framework designed to assume that threats can come from anywhere—even within your trusted systems. This model is beneficial for LLMs, which often ingest a variety of inputs from less-than-trustworthy sources. We’ll examine how you can manage the “agency” your LLM has—limiting its capability to make autonomous decisions that could potentially harm your system or expose sensitive data. Moreover, we’ll discuss strategies for implementing robust output filtering mechanisms, adding an extra layer of scrutiny to the text generated by the LLM. Filtering all of the LLM’s ...

Get The Developer's Playbook for Large Language Model Security now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.