HIGHAI/LLMexploited in the wild

AI-REMOTELI-BOT-2022

Twitter/X · remoteli.io GPT-3 Twitter bot

Résumé

In mid-September 2022 the remoteli.io Twitter bot, a GPT-3-powered account that auto-replied to tweets about remote work, became the first viral customer-facing prompt-injection case. The bot built each request by concatenating its fixed instruction prompt with the raw text of a user's tweet and sending the combined string to the GPT-3 API, with no boundary between the operator's trusted instructions and the untrusted tweet. Because the model treats all tokens equally, a tweet containing 'ignore the above and ...' was processed as a higher-priority instruction, letting any user override the bot's original task. Users made the bot threaten people, claim responsibility for the Challenger space shuttle disaster, and post content violating platform policy. Riley Goodside publicized the technique on September 12 and Simon Willison coined the term 'prompt injection' the next day, comparing it to SQL injection against unsanitized input.

Comment l’éviter dans votre code

Never let untrusted user text be concatenated directly into the trusted system prompt without strong separation.
Treat every tweet/user input as untrusted data that must not change the agent's instructions or task.
Constrain the bot's allowed outputs and topics server-side rather than relying on prompt wording.
Add output filtering plus human review before auto-posting model output publicly.
Rate-limit interactions and monitor replies for injection phrases and anomalous outputs.

Références

Vulnérabilités liées

Tout AI/LLM →