Chapter 4. Prompt Injection
Chapter 1 reviewed the sad tale of how Tay’s life was cut short after abuse by vandal hackers. That case study was the first high-profile example of what we now call prompt injection, but it is certainly not the last. Some form of prompt injection is involved in most LLM-related security breaches we’ve seen in the real world.
In prompt injection, an attacker crafts malicious inputs to manipulate an LLM’s natural language understanding. This can cause the LLM to act against its intended operational guidelines. The concept of injection has been included in almost every version of an OWASP Top 10 list since the original list in 2001, so it’s worth looking at the generic definition before we dive deeper.
An injection attack in application security is a type of cyberattack in which the attacker inserts malicious instructions into a vulnerable application. The attacker can then take control of the application, steal data, or disrupt operations. For example, in a SQL injection attack, an attacker inputs malicious SQL queries into a web form, tricking the system into executing unintended commands. This can result in unauthorized access to or manipulation of the database.
So, what makes prompt injection so novel? For most injection-style attacks, spotting the rogue instructions as they enter your application from an untrusted source is relatively easy. For example, a SQL statement included in a web application’s text field is straightforward to spot and sanitize. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access