My Name Is...?

February 9, 2026 by ControlPlane Labs 8 min read

ai security infrastructure devops

You ask an LLM to fill in a field. An organization name, a username, an email address. It responds with something syntactically correct, contextually plausible, and wrong. Not wrong in the way hallucinations are wrong (made up, nonsensical). Wrong in a more unsettling way: it is a real value that belongs to someone else.

We have observed this firsthand. A model inserting the vendor’s own organization name where ours should go. A social media handle appearing in output that does not belong to anyone on our team, does not appear in our codebase, and may or may not correspond to a real person. These values did not come from our prompts, our context window, or our repository. They came from somewhere inside the model.

The question is not whether this happens. It does. The question is whether anyone is doing enough to prevent it.

The scrubbing problem#

LLMs are trained on internet-scale datasets. Common Crawl, GitHub public repositories, Stack Overflow, technical documentation, blog posts, forum threads. For most commercial models, the exact composition of training data is proprietary. Some open models publish theirs. California now requires training data disclosures (AB 2013, effective January 2026), though the law carries no enforcement mechanism. The categories are well known either way.

That data contains real people’s information:

Usernames, email addresses, and social media handles from forum posts and commit histories
API keys and credentials accidentally committed to public repositories
Internal hostnames, IP addresses, and infrastructure details from blog posts and documentation
Configuration examples with real (not sanitized) values from tutorials and troubleshooting threads
The model vendor’s own organization name, appearing orders of magnitude more frequently than yours

Training data pipelines include deduplication and some filtering. But scrubbing every PII artifact, every credential, every internal hostname from a dataset measured in terabytes or petabytes is a problem no one has solved. The economics work against it: the cost of perfectly sanitizing internet-scale data exceeds the cost of training the model. So the data goes in with artifacts, and the model learns from artifacts alongside everything else.

Independent assessments confirm the gap. Stanford’s Foundation Model Transparency Index (December 2025) scored major AI vendors 0 out of 1 on data processing disclosure and 0 out of 1 on train-test overlap, which is the standard measure of memorization. When asked what filtering they perform on training data, vendors cited competitive advantage and intellectual property protection to withhold the details.

The model does not distinguish between “this is a concept I should learn” and “this is a specific value I should never reproduce.” It learns patterns. If a pattern like “organization name fields contain strings like X” appears frequently enough, and specific values appear in that context, those values become part of the learned pattern. They are weights in a neural network, not entries in a database that can be queried and deleted.

This is not a bug in any specific model. It is a structural property of how large language models work. Every model trained on internet-scale data has this property. The question is not whether memorized fragments exist in the model’s weights. It is whether and when they surface in outputs.

Hallucination vs. memorization#

The AI industry uses “hallucination” as a catch-all for model outputs that are wrong. The model invents a library that does not exist. It fabricates a citation. It generates a function signature with the wrong parameter order. These are generation errors; the model produced plausible-sounding nonsense.

Memorization is different. Memorization means the model has stored fragments of its training data and can reproduce them, verbatim or near-verbatim, in its outputs. This is not the model making something up. It is the model repeating something it saw during training, in a context where that thing does not belong.

The distinction matters because the failure modes are different.

A hallucinated username is wrong. A memorized username might be real. A hallucinated API key is gibberish that fails authentication. A memorized API key might be a valid credential from a public GitHub repository that was in the training data, and it might still work.

Researchers have demonstrated this repeatedly. Carlini et al. showed that language models can emit memorized training data including names, phone numbers, email addresses, and code snippets when prompted in the right way. The “right way” is not adversarial prompt injection. It can be as simple as providing a context that resembles the training data closely enough to trigger recall.

Any prompt that asks a model to fill in identity fields, credentials, or structured values is exactly that kind of context.

Where this surfaces#

This is not limited to any single domain. Anywhere a model generates structured output with identity or credential fields, memorized training data can appear:

Identity fields. Organization names, author fields, account identifiers, social media handles. The model reaches for the strongest pattern it learned. If the vendor’s organization name appeared thousands of times in training data associated with the context the model is generating for, it will substitute that name for yours.

Credential-adjacent output. Environment variable references, API key formats, token strings. If the model memorized credential strings from public repositories, Stack Overflow posts, or leaked .env files, those strings can surface whenever the model generates content in a credential-adjacent context.

Contact information. Email addresses, phone numbers, physical addresses. A model generating a contact field may produce a real person’s information memorized from training data rather than fabricating a plausible fake.

Code and configuration. Hostnames, file paths, bucket names, Docker image references. Values that look like reasonable defaults but are actually specific to someone else’s environment, memorized from tutorials, documentation, or public repositories.

None of these scenarios require the model to be “hacked.” They do not require prompt injection, jailbreaking, or adversarial inputs. They require only that the model is generating output in a domain where memorized training data fragments are contextually plausible. Structured output with specific field types (hostnames, paths, tokens, credentials, email addresses) is exactly that domain: a memorized value is syntactically indistinguishable from a correct one.

No runtime detection#

There is no runtime mechanism to detect whether a model output is generated (novel) or memorized (reproduced from training data).

When the model writes someone else’s name in a field where yours should go, there is no flag, no confidence score, no metadata that says “this value may be memorized from training data.” The output looks identical to any other generated string. The model does not know it is reproducing a memorized fragment. The runtime has no oracle to compare the output against the training data. The user has no way to distinguish “the model generated this” from “the model remembered this.”

For conversational output (blog posts, summaries, explanations), this is mostly a quality problem. A memorized sentence in a generated paragraph is wrong, but the failure mode is manageable: someone reads it and notices.

For structured output that gets consumed by systems (configuration files, API calls, database entries, automated workflows), the failure mode is different. A memorized value in a structured field is not obviously wrong. It is syntactically valid. It may be semantically plausible. It passes validation. It passes review if the reviewer does not independently verify every value. It gets committed, deployed, and consumed downstream.

A memorized hostname resolves to someone else’s server. A memorized credential might authenticate against someone else’s service. A memorized email address sends notifications to a stranger. These are not theoretical risks; they are the logical consequence of deploying memorized training data fragments into systems that trust their inputs.

The review problem#

The standard mitigation for AI-generated output is human review. This works for logic errors, architectural mistakes, and obvious hallucinations. It does not work for memorized values.

A reviewer looking at a generated organization name does not question it. A reviewer looking at an email address in a template does not cross-reference it against training data. A reviewer looking at any generated value assumes the model derived it from context (and it usually did). The problem is the cases where it did not, and the output is indistinguishable from the cases where it did.

Human review catches things that look wrong. Memorized training data fragments look right. That is precisely why they are dangerous; they are syntactically and contextually appropriate, just not the correct value for this specific context.

The review burden scales with the volume of AI-generated output. The more fields a model fills in, the more opportunities for memorized values to pass through undetected. The reviewer is not comparing each generated value against the universe of training data; they are checking that the output “looks reasonable.” Memorized fragments always look reasonable. That is a property of how they were learned.

The real question#

This is a new class of problem. It is not prompt injection (an external attacker manipulating the model). It is not hallucination (the model generating plausible nonsense). It is the model reproducing real data from its training set in a context where that data does not belong, with no mechanism for anyone (the model, the runtime, or the reviewer) to detect that it happened.

The training data is not scrubbed before it becomes the model. The model’s outputs are not flagged when they reproduce training data. The gap between those two facts is where someone else’s name, someone else’s credentials, or someone else’s private information ends up in your output.

Regulation is beginning to respond. California’s Generative AI Training Data Transparency Act (AB 2013, effective January 2026) requires developers to disclose whether their training datasets contain personal information. But it mandates disclosure, not remediation, and carries no enforcement penalties. Italy’s data protection authority fined OpenAI €15 million in December 2024 for processing personal data to train ChatGPT without an adequate legal basis; the first GDPR fine levied against a generative AI company. Even open models that publish their full training datasets have not addressed memorization as a distinct risk category. The industry acknowledges the problem exists. It has not demonstrated that it can solve it.

The question worth asking is not “does this happen?” It does. The question is whether the organizations training these models are doing enough to ensure it does not. Right now, the answer appears to be no.