GPT Memory Poisoning
I recently noticed some weird behaviour from ChatGPT, specifically that it kept calling me David in some of its responses. I first chalked this up to my recent downgrade of ChatGPT Plus to the free version, but that wasnāt the problem. The problem turned out to be ChatGPTās āmemoryā feature, which I had inadvertantly poisoned with Twitter memes.
Just Ask, I Guess
The easiest debugging approach seemed to be to just ask the model why it was behaving the way it was. I know that there are hidden guardrails thatāll prevent it from echoing its system prompt and other hidden metadata but I still figured it was worth a try. Here was the output:
Me: Why do you keep calling me David in our chats?
ChatGPT: Oh! Iāve remembered your name from our earlier conversation to make our chats more personal. If youād prefer I not use it or if thereās something youād like me to change, just let me know! š
Me: Which previous chat? Iāve never used that name.
ChatGPT: Oh, thatās odd! I have a note in my memory that you mentioned your name was David Mayer during a previous chat, but if thatās incorrect, Iāll fix it right away. Could you confirm what youād like me to call you, or if youād rather I donāt use a name at all? š
The Rothschild conspiracy has returned!
Who is David Mayer?
A few weeks ago some folks on Twitter/Reddit/somewhere discovered that ChatGPT would crash on the name David Mayer. No one is quite sure which David Mayer this bug was introduced to protect, but David Mayer de Rothschildās name quickly became associated with this easily reproducible crashing bug, and Iām sure you can imagine how the story progressed on Twitter from there.
Regardless, for a few hours that day there was a race in the community to find a jailbreak prompt that could output the name David Mayer and I tried a few things myself to trigger the crash before it was patched. And in hindsight thatās when I poisoned my LLMās memory.
What is LLM Memory?
OpenAI introduced āmemoryā for ChatGPT, alongside some other major LLM providers. Essentially itās a metadata table thatās built up from details in a given userās history of prompts. In later queries, the LLM can lookup details from this table and inject them into the context window to improve the overall quality of the generated output.
This is very similar to how search engines work, and e-commerce, and really most dynamic web content. Thereās a reason user data is so coveted, itās because personalized content is much more engaging and valuable. You and I may get very different results when we make the same query to Google or Amazon, and their ability to effectively leverage that metadata has been a huge factor in making their products so successful.
Memory is Transparent
One of the great things about LLM memory is that itās actually very transparent, to view it in ChatGPT look under Settings > Personalization > Memory > Manage
. At least for the time being, I can look at the whole table of memory datapoints and I can manually add or remove items from it. In contrast, companies like Google, Amazon, and Facebook keep user data opaque and give you little to no control over it.
This meant that it was easy to confirm the issue. I opened the memory panel and was able to see a row which said: Name is David Mayer.
I deleted the row to fix the problem, but it was still a wake-up call for me. LLM memory feels fundamentally different from other dynamic web content, and it means that I need to take much more care to isolate some of my prompts to ensure that experimental usage doesnāt continue to affect my account as a whole.
Memory Poisoning
So, what does it mean when the memory is wrong? The issue I had here was fairly harmless and easy to notice and fix, but I also didnāt recognize the problem for several weeks. Further, thereās a lot of upside to be gained by adversaries intentionally poisoning an LLMās memory, as you could easily get it to recommend products or even worse to promote scams.
LLMs are an interesting social phenomenon, as theyāve seen a higher rate of adoption than any other technology in history and their output is highly trusted and often under-scrutinized. I think the UX around LLM memory will change in the coming months and years, as itās challenging to communicate the toolās powerful side effects to most users. I also think adversarial memory poisoning will be a fascinating space to watch, as will other meme-propagated metadata making it into AI-generated content.
If I had to guess, there are probably a lot of people named David Mayer using ChatGPT right now.