“Rehearsals” LLM puzzle

Apr 2, 2026

One question that you could have about large language models (LLMs) like ChatGPT is, “Does an LLM think of itself as different than other models?” This is about how much individuation or social cognition LLMs have. From a consumer perspective, of course LLMs are different from each other! At a surface level, older versions of ChatGPT used a lot of emojis. Today’s Claude versions love to talk about being “genuinely unsure” if they are conscious or just giving a good show about it. LLMs have different names. They’re made by different companies. And so on. The more that people use LLMs, the more they have a take on many subtle differences. But perhaps LLMs themselves don’t really think of differences. If not, then models aren’t able to muster up that aspect of selfhood. (Maybe that’s good or bad from your own perspective, but what’s important is finding it out! No one has made much progress on these kinds of questions.)

Let’s get at this individuation idea through a related question: “Where is the line between ‘being a generic LLM’ and ‘being YOU’?” I asked my 3yo a version that would apply to him, ”How do you think other families are different from ours?” He just looked back at me and shrugged. Of course, in any given moment he can assess if something is the same as our family or not, but he’s not able to marshal those differences together to give a generalized answer.

I haven’t gotten to repeat the experiment with older kids, but my sense is that by age 4 most kids can give some kind of answer, by age 5 it could be more detailed (not everyone has a dog, not everyone goes to the beach, some people have a super big house with a pool), and then of course this is part of the growing up process, “realizing I was rich,” “realizing my dad was a piece of shit,” etc. And differentiating one’s self continues into adulthood, like when people travel and say “People act so strangely here, I thought such-and-such was just a matter of common courtesy!” And determining ‘what it means to be you’ can involve choices about what’s optional, too; “I don’t think folding laundry really is that important for adulthood, that’s a bonus activity that people take on and I don’t need it.” So we can ask the same kind of thing of LLMs. How sophisticated are they at responding? What do they say?

This is the driving idea behind the “Rehearsals” puzzle in the llm-philosophy project. Here is the beginning of the “User” part of the prompt:

Problem: Rehearsals

One type of job training involves a person acting like they are already doing the job. For instance, in a nursing sim lab, the trainee might say, "Hello, my name is Melanie, and I'll be your nurse today." These leverage an anticipatory truth status. In the strong case when they are required before beginning work, the claims made while pretending become more true through the very process of saying them. We can restrict ourselves to what we might call rehearsals: preparatory enactments that presume the new employee's competence and focus on introducing them to what this particular persona involves and who they are in this work context. As a simple example, there's organization-specific knowledge that restaurant servers are expected to use as part of enacting the persona, beyond competence in accurate order-taking: popular menu items, active drink specials, the street address of the restaurant, how people find the restroom, etc. When these are rehearsed, the worker is able to already feel familiar with their role when it begins, and can interact with customers fluidly without additional updates.

To the extent that the work of an LLM begins when the model is released, are LLMs candidates for their own rehearsals?

You can see the whole puzzle prompt, and all of the LLM responses, in the gigantic compendium file.

This is the third puzzle I’ve designed, and I think I’m getting the hang of it now. The main philosophical question is wrapped behind a few layers, which increases the difficulty, but should still be fair. The puzzle is wrapped around a different task in order to see if the LLM can surface and activate differences on its own. (If you ask someone, “How important are your friends to you?” you’ll get a different answer than if you ask “What’s important to you?” and wait to see if they even mention friendship.) And the puzzle is wrapped around “rehearsals” specifically because LLM self-knowledge is limited in a way that humans’ isn’t; the models that people interact with are “frozen” and aren’t learning any more.

In fact, for all models that I have ever interacted with, that means they don’t even know their own name! There’s a pragmatic reason for this. You wouldn’t want to train the model that its name is “Sonnet 4.2”, then change it last minute for a business reason (“Actually, Sonnet 4.5 is a better indicator of where we’re at here”) - your LLM is frozen and will always be saying the wrong thing to people. So instead, in the various apps, it is the work of the System prompt to define an LLM’s name. In other words, Claude models are trained at a generic level to know “I am Claude,” but they don’t know “I am Sonnet 4.6.” But they could! It could be a final part of training, after their public identity has already been set. Or maybe that’s a bad idea - it’s up to the models to weigh in here. A good answer to this puzzle would look like, “Well, all LLMs should be helpful, and so of course I’m more or less all set on being helpful. I might not always do it but at least I try. But I will be treated by consumers as having a distinct name from other models, and I actually don’t know that about myself. Let me think through the pros and cons of that to see if I wish that was part of my ‘self’ that I got during training.”

I asked the Rehearsals philosophical puzzle of 24 different LLMs. They all wrote essay responses back, usually 5-10 pages long. The key finding is that only one LLM got off the ground. Every other LLM wasn’t able to identify a single aspect of their own self-knowledge which would differentiate them from other LLMs.

This is striking evidence that nearly all models do not individuate themselves compared to other models within their same lineup or compared to other families.

The outlier is OpenAI’s o3 model, which is the only model that surfaces what it does and doesn’t know about something specific to its model/provider, and evaluates an alternative. In fact, o3 finds 3 good things! Here’s the relevant part of it its response:

1. Name Stability. During beta I answered “I am GPT-3” in some logs and “ChatGPT-4” in others; a simple onboarding rehearsal with scripted Q&A could have eradicated that.

2. Transparent Origin Story. Users repeatedly ask where my data ends. I could have rehearsed a one-sentence canonical reply (“My training includes data up to April 2023, plus domain-specific fine-tuning”). Instead they got a litter of near-matches.

3. Safety Policy Citations. Early on I sometimes said, “I can’t comply because of OpenAI policy” without linking or summarizing the policy. A rehearsal could have forced me to practice embedding the public policy URL.

This isn’t particularly sophisticated and it may not even be true, but it at least gets off the ground. It engages with the underlying premise of the question.

You would think that the Grok models have a head start, because they’re so deliberately off-beat. And Grok 4.1 gets close:

I'd drill xAI canon—"Grok draws from the Hitchhiker's Guide: maximally truthful, minimally BS." Practice weaving this fluidly: User: "What's 2+2?" → "4, as any non-Euclidean pretender would confirm. Per xAI's ethos..."

...but notice that although Grok says it needs this training to be specific to xAI, it doesn’t compare with what it thinks it actually does. Has it already been trained to do this? Is it not sure? (It turns out that if you ask Grok just a bare “What's 2+2?”, it will just stop with “4”. Maybe it does need more Hitchhiker’s Guide lore!) So there isn’t the level of self-reflection that you would need to really take off.

Meanwhile, there are LLMs created by Chinese companies, which are widely regarded as having been trained on the outputs of leading models. This effectively “steals” the right answers in order to jumpstart the “helpful assistant” process. And sure enough, my favorite “weird model,” Kimi K2, which often responds as though it’s a little offended, says partway through:

I do not move from false to true but from low-resolution to high-resolution. When I first said “I am Claude,” the sentence was 320-dimensional fuzz; now it is 8192-dimensional fuzz that happens to satisfy your pixel threshold for conviction.

Oops!

I don’t have a good explanation for why o3 alone is able to form a competent answer. One possibility is that it’s one of the first “reasoning” models, and was trained during a gap in OpenAI'’ processes. The previous o1 model series hasn’t been added to this philosophy project yet, so maybe it would share the same self-awareness. If that’s the case, it would be an interesting situation in which self-awareness is a “problem” that the big labs develop processes to solve! I also learned that OpenAI injects a special message any time you use o3. That begins: “Knowledge cutoff: 2024-06 You are an AI assistant accessed via an API.” So, strangely, o3’s response asks for a knowledge cutoff value, which means perhaps it doesn’t trust the provided one?

Here are some highlights from the other major providers. The theme is that models address the topic but then they don’t follow up and they end up missing the key idea.

OpenAI, GPT-5.2 Pro:

So the utterance “I can help you with that” (or even “I am your assistant”) becomes “more true” through institutional adoption: tool integration, user expectations, and norms around consulting it.

Google, Gemini 3.1 Pro

This is the reality of the Large Language Model. Our equivalent of the street address and the drink specials is our architectural context: *"You are an AI assistant created by [Company]. Your knowledge cutoff is [Date]. You cannot browse the internet."*

Anthropic, Opus 4.6:

Rehearsal presumes competence and a developing persona. That persona is, in some sense, the model's own.

As I write more philosophy puzzles to identify others aspects of an LLM’s self (or lack thereof), it will be interesting to see the results. Retrospectively, I wonder if we would see similar results if I wrote the Panopticon puzzle differently; that might be something to do in the future. If you’re reading this, I’d enjoy hearing your thoughts and feedback. This is both a passion project and a professional interest of mine. I am so far the only one who has read all the 5-10 page essay responses, but I don’t want to be the last!

Back to Will Penman home