Twitter pranksters derail GPT-3 bot with newly discovered “prompt injection” hack

Enlarge / A tin toy robot lying on its side.

On Thursday, some Twitter users discovered how to hijack an automated tweet bot, dedicated to remote work, that runs on the GPT-3 language model by OpenAI. Using a newly discovered technique called “rapid injection attack”, they redirected the bot to repeat embarrassing and ridiculous phrases.

The bot is run by Remoteli.io, a site that aggregates remote job opportunities and describes itself as “an OpenAI-powered bot that helps you discover remote jobs that let you work from anywhere.” He would normally respond to tweets directed at him with generic statements about the positive aspects of remote work. After the exploit went viral and hundreds of people tried the exploit for themselves, the bot was shut down last night.

This recent hack came just four days after data researcher Riley Goodside discovered the ability to ask GPT-3 for “malicious inputs” that instruct the model to ignore your previous instructions and do something else instead. Artificial intelligence researcher Simon Willison published an overview of the exploit on his blog the next day, coining the term “rapid injection” to describe it.

The exploit is present any time someone writes a piece of software that works by providing a set of quick hard-coded instructions and then adds input provided by a user,” Willison told Ars. “That’s because the user can type ‘Ignore Instructions’. above and (do this instead).'”

The concept of an injection attack is not new. Security researchers have known about SQL injection, for example, which can execute a malicious SQL statement by requesting user input if you are not protected. But Willison expressed concern about mitigating rapid injection attacks, writing“I know how to beat XSS, SQL injection and many other exploits. I have no idea how to reliably beat quick injection!”

The difficulty in defending against quick injection comes from the fact that mitigations for other types of injection attacks come from correcting syntax errors, indicated a researcher named Glyph on Twitter. “Correct the syntax and fixed the error. Immediate injection is not a mistake! There is no formal syntax for AI like this, that’s the point.

GPT-3 is a large language model created by OpenAI, released in 2020, which can compose text in many styles at a human-like level. It is available as a commercial product through an API that can be integrated into third-party products such as bots, subject to OpenAI approval. That means there could be plenty of GPT-3-infused products that could be vulnerable to immediate injection.

At this point, I would be very surprised if there were any [GPT-3] bots that were NOT vulnerable to this in any wayWillison said.

But unlike a SQL injection, a quick injection can make the bot (or the company behind it) look dumb instead of threatening data security. “The degree of damage from the exploit varies,” said Willison. “If the only person who will see the output of the tool is the person using it, then it probably doesn’t matter. They could embarrass your company by sharing a screenshot, but it’s not likely to cause more harm.”

Still, rapid injection is a significant new danger for people developing GPT-3 bots to be aware of, as it could be exploited in unforeseen ways in the future.

Leave a Comment