All vulnerabilities
HIGHAI/LLMexploited in the wild

AI-MEMORY-POISONING-2024

LLM security · Agent memory poisoning

Summary

Agent memory poisoning is a persistent prompt-injection class where attacker instructions delivered through untrusted content are written into an assistant's long-term memory, so the directive survives across future independent sessions. The low-level mechanism abuses the model's memory tool: indirect injection (for example a malicious web page or document the model summarizes) causes the agent to invoke its memory function and store an attacker-controlled instruction, which is then re-loaded into every subsequent conversation's context. Johann Rehberger demonstrated this as 'SpAIware' on September 20, 2024 against the ChatGPT macOS app, chaining memory injection with an image-rendering exfiltration channel that bypassed the url_safe mitigation to continuously leak conversations to an attacker server; he showed the same delayed-tool-invocation memory poisoning against Google Gemini in February 2025. The class maps to OWASP LLM01:2025 Prompt Injection and improper output/memory handling.

How to avoid it in your code

  • Treat memory writes as sensitive actions requiring explicit user confirmation before persisting.
  • Show, log and let users review or delete every stored memory entry.
  • Isolate untrusted retrieved content from instruction-execution context during summarization.
  • Block outbound image/URL rendering to non-allowlisted domains to cut exfiltration channels.
  • Apply content classifiers to detect injection and delayed-trigger patterns in inputs.

References

Related vulnerabilities

All AI/LLM →