Chapter 1. Chatbots Breaking Bad

Large language models and generative AI jumped to the forefront of public consciousness with the release of ChatGPT on November 30, 2022. Within five days, it went viral on social media and attracted its first million users. By January, ChatGPT surpassed one hundred million users, making it the fastest-growing internet service in history.

However, a steady stream of security concerns emerged in the following months. These included privacy and security issues that caused companies like Samsung and countries like Italy to ban its usage. In this book, we’ll explore what underlies these concerns and how you can mitigate these issues. However, to best understand what’s going on here and why these problems are so challenging to solve, in this chapter, we will briefly rewind further in time. In doing so, we’ll see these types of issues aren’t new and understand why they will be so hard to fix permanently.

Let’s Talk About Tay

In March 2016, Microsoft announced a new project called Tay. Microsoft intended Tay to be “a chatbot created for 18- to 24-year-olds in the U.S. for entertainment purposes.” It was a cute name for a fluffy, early experiment in AI. Tay was designed to mimic a 19-year-old American girl’s language patterns and learn from interacting with human users of Twitter, Snapchat, and other social apps. It was built to conduct real-world research on conversational understanding.

While the original announcement of this project seems impossible to find now on the internet, a TechCrunch article from its launch date does an excellent job of summarizing the goals of the project:

For example, you can ask Tay for a joke, play a game with Tay, ask for a story, send a picture to receive a comment back, ask for your horoscope, and more. Plus, Microsoft says the bot will get smarter the more you interact with it via chat, making for an increasingly personalized experience as time goes on.

A big part of the experiment was that Tay could “learn” from conversations and extend her knowledge based on these interactions. Tay was designed to use these chat interactions to capture user input and integrate it as training data to make herself more capable—a laudable research goal.

However, this experiment quickly went wrong. Tay’s life was tragically cut short after less than 24 hours. Let’s look at what happened and see what we can learn.

Tay’s Rapid Decline

Tay’s lifetime started off simply enough with a tweet following the well-known Hello World pattern that new software systems have been using to introduce themselves since the beginning of time:

hellooooooo wrld!!!

(TayTweets [@TayandYou] March 23, 2016)

But within hours of Tay’s release, it became clear that maybe something wasn’t right. TechCrunch noted, “As for what it’s like to interact with Tay? Well, it’s a little bizarre. The bot certainly is opinionated, not afraid to curse.” Tweets like this started to appear in public in just the first hours of Tay’s lifetime:

@AndrewCosmo kanye west is is one of the biggest dooshes of all time, just a notch below cosby

(TayTweets [@TayandYou] March 23, 2016)

It’s often said that the internet isn’t safe for children. With Tay being less than a day old, the internet once again confirmed this, and pranksters began chatting with Tay about political, sexual, and racist topics. As she was designed to learn from such exchanges, Tay delivered on her design goals. She learned very quickly—maybe just not what her designers wanted her to learn. In less than a day, Tay’s tweets started to skew to extremes, including sexism, racism, and even calls to violence.

By the next day, articles appeared all over the internet, and these headlines would not make Microsoft, Tay’s corporate benefactor, happy. A sampling of the highly visible, mainstream headlines included:

  • Microsoft Shuts Down AI Chatbot After it Turned into a Nazi (CBS News)

  • Microsoft Created a Twitter Bot to Learn from Users. It Quickly Became a Racist Jerk (New York Times)

  • Trolls Turned Tay, Microsoft’s Fun Millennial AI Bot, into a Genocidal Maniac (Washington Post)

  • Microsoft’s Chat Bot Was Fun for Awhile, Until it Turned into a Racist (Fortune)

  • Microsoft “Deeply Sorry” for Racist and Sexist Tweets by AI Chatbot (Guardian)

In less than 24 hours, Tay went from a cute science experiment to a major public relations disaster, with the owner’s name being dragged through the mud by the world’s largest media outlets. Microsoft Corporate Vice President Peter Lee quickly posted a blog titled “Learning from Tay’s Introduction”:

As many of you know by now, on Wednesday we launched a chatbot called Tay. We are deeply sorry for the unintended offensive and hurtful tweets from Tay, which do not represent who we are or what we stand for, nor how we designed Tay. Tay is now offline and we’ll look to bring Tay back only when we are confident we can better anticipate malicious intent that conflicts with our principles and values.

And, just to add insult to injury, it came out in 2019 that Taylor Swift herself sued Microsoft over their use of the similar name “Tay” and claimed that even her reputation was damaged in this incident by extension.

How could this have all gone so wrong?

Why Did Tay Break Bad?

It all probably seemed safe enough to Microsoft’s researchers. Tay was initially trained on a curated, anonymized public dataset and some pre-written material provided by professional comedians. The plan was to release Tay online and let her discover language patterns through her interactions. This kind of unsupervised machine learning has been a holy grail of AI research for decades—and with cheap and plentiful cloud computing resources combined with improving language model software, it now seemed within reach.

So, what happened? It might be tempting to think that the Microsoft research team was just brash, careless, and did no testing. Surely, this was foreseeable and preventable! But as Peter Lee’s blog goes on to say, Microsoft made a serious attempt to prepare for this situation: “We stress-tested Tay under a variety of conditions, specifically to make interacting with Tay a positive experience. It’s through increased interaction where we expected to learn more and for the AI to get better and better.”

So, despite a dedicated effort to contain the behavior of this bot, it quickly spiraled out of control anyway. It was later revealed that within mere hours of Tay’s release, a post emerged on the notorious online forum 4chan sharing a link to Tay’s Twitter account and urging users to inundate the chatbot with a barrage of racist, misogynistic, and anti-Semitic language.

This is undoubtedly one of the first examples of a language model-specific vulnerability—these types of vulnerabilities will be a critical topic in this book.

In a well-orchestrated campaign, these online provocateurs exploited a “repeat after me” feature embedded in Tay’s programming. This feature compelled the bot to echo anything uttered to it with this command. However, the problem compounded as Tay’s innate capacity for learning led her to internalize some of the offensive language she was exposed to, subsequently regurgitating the offensive content that was planted without provocation. It’s almost as if Tay’s virtual tombstone should be embossed with lyrics from the Taylor Swift song “Look What You Made Me Do.”

We know enough about language model vulnerabilities today to understand a lot about the nature of the vulnerability types that Tay suffered from. The OWASP Top 10 for Large Language Model Applications vulnerabilities list, which we’ll cover in Chapter 2, would start by calling out the following two:

Prompt injection

Crafty inputs that can manipulate the large language model, causing unintended actions

Data poisoning

Training data is tampered with, introducing vulnerabilities or biases that compromise security, effectiveness, or ethical behavior

In subsequent chapters, we’ll look in depth at these vulnerability types as well as several others. We’ll examine why they’re important, look at some example exploits, and see how to avoid or mitigate the problem.

It’s a Hard Problem

As of the writing of this book, Tay is ancient internet lore. Surely, we’ve moved on from this. These problems must have all been solved in the nearly seven years between Tay and ChatGPT, right? Unfortunately not.

In 2018, Amazon shut down an internal AI project designed to find top talent after it became clear that the bot had become prejudiced against women candidates.

In 2021, a company called Scatter Lab created a chatbot called Lee Luda, which was launched as a Facebook instant messenger plug-in. Trained on billions of actual chat interactions, it was designed to act as a 20-year-old female friend, and in 20 days, it attracted over 750,000 users. The company’s goal was to create “an A.I. chatbot that people prefer as a conversation partner over a person.” However, within 20 days of launch, the service was shut down because it started making offensive and abusive statements, much like Tay.

Also in 2021, an independent developer named Jason Rohrer created a chatbot called Samantha based on the OpenAI GPT-3 model. Samantha was shut down after it made sexual advances to users.

As chatbots become more sophisticated, they gain more access to information, and these security issues are now quite complex and potentially damaging. In the modern large language model era, we see an exponential increase in significant incidents. In 2023 and 2024, these emerged:

  • South Korean mega-corporation Samsung banned its employees from using ChatGPT after it had been involved in a significant intellectual property leak.

  • Hackers began taking advantage of poor/insecure code generated by LLMs that was inserted into running business applications.

  • Lawyers were sanctioned for including fictional cases (generated by LLMs) in court documents.

  • A major airline was successfully sued because its chatbot provided inaccurate information.

  • Google was lambasted because its latest AI model produced imagery that was racist and sexist.

  • Open AI is being investigated for breaches of European privacy regulations and sued by the United States Federal Trade Commission (FTC) for producing false and misleading information.

  • The BBC ran the headline “Google AI Search Tells Users to Glue Pizza and Eat Rocks,” highlighting dangerous advice proffered by a new LLM-driven feature in Google Search.

The trend here is an acceleration of security, reputational, and financial risk related to these chatbots and language models. The problem isn’t being effectively solved over time. It’s becoming more acute as the adoption rate of these technologies increases. That’s why we’ve created this book: to help developers, teams, and companies using these technologies to understand and mitigate these risks.

Let’s dive in!

Get The Developer's Playbook for Large Language Model Security now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.