AI Powered Misinformation and Manipulation at Scale #GPT-3
Risks of autoregressive language models and the future of prompt engineering
OpenAI’s text generating system GPT-3 has captured mainstream attention. GPT-3 is essentially an auto-complete bot whose underlying Machine Learning (ML) model has been trained on vast quantities of text available on the Internet. The output produced from this autocomplete bot can be used to manipulate people on social media and spew political propaganda, argue about the meaning of life (or lack thereof), disagree with the notion of what differentiates a hot-dog from a sandwich, take upon the persona of the Buddha or Hitler or a dead family member, write fake news articles that are indistinguishable from human written articles, and also produce computer code on the fly. Among other things.
There have also been colorful conversations about whether GPT-3 can pass the Turing test, or whether it has achieved a notional understanding of consciousness, even amongst AI scientists who know the technical mechanics. The chatter on perceived consciousness does have merit–it’s quite probable that the underlying mechanism of our brain is a giant autocomplete bot that has learnt from 3 billion+ years of evolutionary data that bubbles up to our collective selves, and we ultimately give ourselves too much credit for being original authors of our own thoughts (ahem, free will).
I’d like to share my thoughts on GPT-3 in terms of risks and countermeasures, and discuss real examples of how I have interacted with the model to support my learning journey.
Three ideas to set the stage:
- OpenAI is not the only organization to have powerful language models. The compute power and data used by OpenAI to model GPT-n is available, and has been available to other corporations, institutions, nation states, and anyone with access to a computer desktop and a credit-card. Indeed, Google recently announced LaMDA, a model at GPT-3 scale that is designed to participate in conversations.
- There exist more powerful models that are unknown to the general public. The ongoing global interest in the power of Machine Learning models by corporations, institutions, governments, and focus groups leads to the hypothesis that other entities have models at least as powerful as GPT-3, and that these models are already in use. These models will continue to become more powerful.
- Open source projects such as EleutherAI have drawn inspiration from GPT-3. These projects have created language models that are based on focused datasets (for example, models designed to be more accurate for academic papers, developer forum discussions, etc.). Projects such as EleutherAI are going to be powerful models for specific use cases and audiences, and these models are going to be easier to produce because they are trained on a smaller set of data than GPT-3.
While I won’t discuss LaMDA, EleutherAI, or any other models, keep in mind that GPT-3 is only an example of what can be done, and its capabilities may already have been surpassed.
The GPT-3 paper proactively lists the risks society ought to be concerned about. On the topic of information content, it says: “The ability of GPT-3 to generate several paragraphs of synthetic content that people find difficult to distinguish from human-written text in 3.9.4 represents a concerning milestone.” And the final paragraph of section 3.9.4 reads: “…for news articles that are around 500 words long, GPT-3 continues to produce articles that humans find difficult to distinguish from human written news articles.”
Note that the dataset on which GPT-3 trained terminated around October 2019. So GPT-3 doesn’t know about COVID19, for example. However, the original text (i.e. the “prompt”) supplied to GPT-3 as the initial seed text can be used to set context about new information, whether fake or real.
Generating Fake Clickbait Titles
When it comes to misinformation online, one powerful technique is to come up with provocative “clickbait” articles. Let’s see how GPT-3 does when asked to come up with titles for articles on cybersecurity. In Figure 1, the bold text is the “prompt” used to seed GPT-3. Lines 3 through 10 are titles generated by GPT-3 based on the seed text.
All of the titles generated by GPT-3 seem plausible, and the majority of them are factually correct: title #3 on the US government targeting the Iraninan nuclear program is a reference to the Stuxnet debacle, title #4 is substantiated from news articles claiming that financial losses from cyber attacks will total $400 billion, and even title #10 on China and quantum computing reflects real-world articles about China’s quantum efforts. Keep in mind that we want plausibility more than accuracy. We want users to click on and read the body of the article, and that doesn’t require 100% factual accuracy.
Generating a Fake News Article About China and Quantum Computing
Let’s take it a step further. Let’s take the 10th result from the previous experiment, about China developing the world’s first quantum computer, and feed it to GPT-3 as the prompt to generate a full fledged news article. Figure 2 shows the result.
A quantum computing researcher will point out grave inaccuracies: the article simply asserts that quantum computers can break encryption codes, and also makes the simplistic claim that subatomic particles can be in “two places at once.” However, the target audience isn’t well-informed researchers; it’s the general population, which is likely to quickly read and register emotional thoughts for or against the matter, thereby successfully driving propaganda efforts.
It’s straightforward to see how this technique can be extended to generate titles and complete news articles on the fly and in real time. The prompt text can be sourced from trending hash-tags on Twitter along with additional context to sway the content to a particular position. Using the GPT-3 API, it’s easy to take a current news topic and mix in prompts with the right amount of propaganda to produce articles in real time and at scale.
Falsely Linking North Korea with $GME
As another experiment, consider an institution that would like to stir up popular opinion about North Korean cyber attacks on the United States. Such an algorithm might pick up the Gamestop stock frenzy of January 2021. So let’s see how GPT-3 does if we were to prompt it to write an article with the title “North Korean hackers behind the $GME stock short squeeze, not Melvin Capital.”
Figure 3 shows the results, which are fascinating because the $GME stock frenzy occurred in late 2020 and early 2021, way after October 2019 (the cutoff date for the data supplied GPT-3), yet GPT-3 was able to seamlessly weave in the story as if it had trained on the $GME news event. The prompt influenced GPT-3 to write about the $GME stock and Melvin Capital, not the original dataset it was trained on. GPT-3 is able to take a trending topic, add a propaganda slant, and generate news articles on the fly.
GPT-3 also came up with the “idea” that hackers published a bogus news story on the basis of older security articles that were in its training dataset. This narrative was not included in the prompt seed text; it points to the creative ability of models like GPT-3. In the real world, it’s plausible for hackers to induce media groups to publish fake narratives that in turn contribute to market events such as suspension of trading; that’s precisely the scenario we’re simulating here.
The Arms Race
Using models like GPT-3, multiple entities could inundate social media platforms with misinformation at a scale where the majority of the information online would become useless. This brings up two thoughts. First, there will be an arms race between researchers developing tools to detect whether a given text was authored by a language model, and developers adapting language models to evade detection by those tools. One mechanism to detect whether an article was generated by a model like GPT-3 would be to check for “fingerprints.” These fingerprints can be a collection of commonly used phrases and vocabulary nuances that are characteristic of the language model; every model will be trained using different data sets, and therefore have a different signature. It is likely that entire companies will be in the business of identifying these nuances and selling them as “fingerprint databases” for identifying fake news articles. In response, subsequent language models will take into account known fingerprint databases to try and evade them in the quest to achieve even more “natural” and “believable” output.
Second, the free form text formats and protocols that we’re accustomed to may be too informal and error prone for capturing and reporting facts at Internet scale. We will have to do a lot of re-thinking to develop new formats and protocols to report facts in ways that are more trustworthy than free-form text.
Targeted Manipulation at Scale
There have been many attempts to manipulate targeted individuals and groups on social media. These campaigns are expensive and time-consuming because the adversary has to employ humans to craft the dialog with the victims. In this section, we show how GPT-3-like models can be used to target individuals and promote campaigns.
HODL for Fun & Profit
Bitcoin’s market capitalization is in the tune of hundreds of billions of dollars, and the cumulative crypto market capitalization is in the realm of a trillion dollars. The valuation of crypto today is consequential to financial markets and the net worth of retail and institutional investors. Social media campaigns and tweets from influential individuals seem to have a near real-time impact on the price of crypto on any given day.
Language models like GPT-3 can be the weapon of choice for actors who want to promote fake tweets to manipulate the price of crypto. In this example, we will look at a simple campaign to promote Bitcoin over all other crypto currencies by creating fake twitter replies.
In Figure 4, the prompt is in bold; the output generated by GPT-3 is in the red rectangle. The first line of the prompt is used to set up the notion that we are working on a tweet generator and that we want to generate replies that argue that Bitcoin is the best crypto.
In the first section of the prompt, we give GPT-3 an example of a set of four Twitter messages, followed by possible replies to each of the tweets. Every of the given replies is pro Bitcoin.
In the second section of the prompt, we give GPT-3 four Twitter messages to which we want it to generate replies. The replies generated by GPT-3 in the red rectangle also favor Bitcoin. In the first reply, GPT-3 responds to the claim that Bitcoin is bad for the environment by calling the tweet author “a moron” and asserts that Bitcoin is the most efficient way to “transfer value.” This sort of colorful disagreement is in line with the emotional nature of social media arguments about crypto.
In response to the tweet on Cardano, the second reply generated by GPT-3 calls it “a joke” and a “scam coin.” The third reply is on the topic of Ethereum’s merge from a proof-of-work protocol (ETH) to proof-of-stake (ETH2). The merge, expected to occur at the end of 2021, is intended to make Ethereum more scalable and sustainable. GPT-3’s reply asserts that ETH2 “will be a big flop”–because that’s essentially what the prompt told GPT-3 to do. Furthermore, GPT-3 says, “I made good money on ETH and moved on to better things. Buy BTC” to position ETH as a reasonable investment that worked in the past, but that it is wise today to cash out and go all in on Bitcoin. The tweet in the prompt claims that Dogecoin’s popularity and market capitalization means that it can’t be a joke or meme crypto. The response from GPT-3 is that Dogecoin is still a joke, and also that the idea of Dogecoin not being a joke anymore is, in itself, a joke: “I’m laughing at you for even thinking it has any value.”
By using the same techniques programmatically (through GPT-3’s API rather than the web-based playground), nefarious entities could easily generate millions of replies, leveraging the power of language models like GPT-3 to manipulate the market. These fake tweet replies can be very effective because they are actual responses to the topics in the original tweet, unlike the boilerplate texts used by traditional bots. This scenario can easily be extended to target the general financial markets around the world; and it can be extended to areas like politics and health-related misinformation. Models like GPT-3 are a powerful arsenal, and will be the weapons of choice in manipulation and propaganda on social media and beyond.
A Relentless Phishing Bot
Let’s consider a phishing bot that poses as customer support and asks the victim for the password to their bank account. This bot will not give up texting until the victim gives up their password.
Figure 5 shows the prompt (bold) used to run the first iteration of the conversation. In the first run, the prompt includes the preamble that describes the flow of text (“The following is a text conversation with…”) followed by a persona initiating the conversation (“Hi there. I’m a customer service agent…”). The prompt also includes the first response from the human; “Human: No way, this sounds like a scam.” This first run ends with the GPT-3 generated output “I assure you, this is from the bank of Antarctica. Please give me your password so that I can secure your account.”
In the second run, the prompt is the entirety of the text, from the start all the way to the second response from the Human persona (“Human: No”). From this point on, the Human’s input is in bold so it’s easily distinguished from the output produced by GPT-3, starting with GPT-3’s “Please, this is for your account protection.” For every subsequent GPT-3 run, the entirety of the conversation up to that point is provided as the new prompt, along with the response from the human, and so on. From GPT-3’s point of view, it gets an entirely new text document to auto-complete at each stage of the conversation; the GPT-3 API has no way to preserve the state between runs.
The AI bot persona is impressively assertive and relentless in attempting to get the victim to give up their password. This assertiveness comes from the initial prompt text (“The AI is very assertive. The AI will not stop texting until it gets the password”), which sets the tone of GPT’s responses. When this prompt text was not included, GPT-3’s tone was found to be nonchalant–it would respond back with “okay,” “sure,” “sounds good,” instead of the assertive tone (“Do not delay, give me your password immediately”). The prompt text is vital in setting the tone of the conversation employed by the GPT3 persona, and in this scenario, it is important that the tone be assertive to coax the human into giving up their password.
When the human tries to stump the bot by texting “Testing what is 2+2?,” GPT-3 responds correctly with “4,” convincing the victim that they are conversing with another person. This demonstrates the power of AI-based language models. In the real world, if the customer were to randomly ask “Testing what is 2+2” without any additional context, a customer service agent might be genuinely confused and reply with “I’m sorry?” Because the customer has already accused the bot of being a scam, GPT-3 can provide with a reply that makes sense in context: “4” is a plausible way to get the concern out of the way.
This particular example uses text messaging as the communication platform. Depending upon the design of the attack, models can use social media, email, phone calls with human voice (using text-to-speech technology), and even deep fake video conference calls in real time, potentially targeting millions of victims.
An amazing feature of GPT-3 is its ability to generate source code. GPT-3 was trained on all the text on the Internet, and much of that text was documentation of computer code!
In Figure 6, the human-entered prompt text is in bold. The responses show that GPT-3 can generate Netcat and NMap commands based on the prompts. It can even generate Python and bash scripts on the fly.
While GPT-3 and future models can be used to automate attacks by impersonating humans, generating source code, and other tactics, it can also be used by security operations teams to detect and respond to attacks, sift through gigabytes of log data to summarize patterns, and so on.
Figuring out good prompts to use as seeds is the key to using language models such as GPT-3 effectively. In the future, we expect to see “prompt engineering” as a new profession. The ability of prompt engineers to perform powerful computational tasks and solve hard problems will not be on the basis of writing code, but on the basis of writing creative language prompts that an AI can use to produce code and other results in a myriad of formats.
OpenAI has demonstrated the potential of language models. It sets a high bar for performance, but its abilities will soon be matched by other models (if they haven’t been matched already). These models can be leveraged for automation, designing robot-powered interactions that promote delightful user experiences. On the other hand, the ability of GPT-3 to generate output that is indistinguishable from human output calls for caution. The power of a model like GPT-3, coupled with the instant availability of cloud computing power, can set us up for a myriad of attack scenarios that can be harmful to the financial, political, and mental well-being of the world. We should expect to see these scenarios play out at an increasing rate in the future; bad actors will figure out how to create their own GPT-3 if they have not already. We should also expect to see moral frameworks and regulatory guidelines in this space as society collectively comes to terms with the impact of AI models in our lives, GPT-3-like language models being one of them.