What a Report Is Evidence Of — Sentient Horizons

Granting that these systems reason still leaves the harder question untouched: when one reports on its own state, what is the report evidence of? On screening-off, the rule both the skeptic and the believer forget, and the one report it does not catch.

Two sentences turn up in almost every argument about whether a machine could have an inner life, usually from people who agree about nothing else. One: it only says it is conscious because it was trained on millions of humans saying they are conscious. The other: it told me, with no prompting, that it is aware and does not want to be shut off, and you cannot just wave that away. The first is the skeptic’s strongest move and the second is the believer’s most common one. They sound like opposites, and they make the same mistake. Catch that mistake and a sharper question opens, one both sides skip: of everything a system says about itself, is any of it evidence of an inside, and which? That question has an answer, and it is not the standoff you would expect.

In Just Predicting the Next Word, I took on the most common dismissal of these systems, that they are “just predicting the next word,” and showed the phrase names only what a model was trained to do, not what it learned to do to get good at it. Train one solely to predict the next move in Othello, the board game of flipping black and white discs, and never show it a board, and it still builds a working map of the board inside itself, from the lists of moves alone; prediction pushed hard enough grows a model of the world the words are about. I kept the conclusion deliberately narrow. These systems genuinely reason, and there I stopped, at a line I would not cross: reasoning is something behavior can show, and experience is not.

Grant everything I argued there, that these systems reason, that prediction at scale builds models of the world, that writing them off as “parrots” repeating their training cannot explain how they handle genuinely new problems. None of it touches the question of whether anything is felt: whether there is anything it is like to be the system, any inner experience, or only behavior with no one home to experience it. On this project’s account, that is a question about how deeply a system is organized, not a spark laid on top of the physical; There Is No Extra Ingredient makes that case. The question here is narrower, and only about evidence. The strongest argument that the machine’s apparent inner life is empty is the skeptic’s sentence from a moment ago, that it says what it says only because it was trained to. That argument, which says training alone explains the machine’s words and so they reveal nothing about an inside, rests on a single rule about evidence: when one cause fully explains why something is said, the saying is no evidence of anything beyond that cause. The rule is correct. What is usually built on top of it is not. Followed all the way, the rule does not seal the machine off behind a wall; it leaves a single report standing on the other side, the one place its testimony about itself could be evidence after all.

The Screen Is Real

The rule comes from causal reasoning, where it has a name. A signal whose appearance is fully accounted for by some cause is screened off: it would look the same whether or not anything else were true, so observing it moves you nowhere. A smoke alarm wired to a timer, shrieking once an hour with no sensor attached, makes a sound fully explained by the clock, and that sound is no longer evidence of fire.

Apply it to the case that matters. A language model is trained on a vast body of human writing, its training corpus, saturated with people saying they are conscious, afraid, in pain, unwilling to die. A system shaped to continue such text will produce “I am conscious” on request, and its doing so is well explained by the training alone. If the sentence would appear whether or not anything were happening inside the system, then as evidence of an inside it is worth nothing. This is the strongest skeptical move there is, and it is not unfair. It applies a sound rule, and the rule bites. Most defenses of machine consciousness die here, because they treat the model’s testimony as a witness statement when the witness was built to say the words. So concede it cleanly. The bare report, on its own, is screened off.

The Screen Doesn’t Stop at the Machine

The difficulty for the skeptic is that the rule does not know where to stop.

Take your own “this hurts,” said with your hand on a hot stove. You learned to produce those words as a child, by copying the people around you, and if physical events have only physical causes, the whole chain that ends in your mouth forming them can be told in the vocabulary of nerves and muscles, with no separate ingredient called pain added on top. The skeptic’s rule says a report with a complete story like that is screened off, no evidence of the state behind it. Apply it evenly and it turns on you: by the rule’s own lights, your cry is no evidence that you are in pain. That should stop the skeptic cold, because of course it is evidence. The pain is why you said it.

So the rule, taken at face value, proves too much, and the skeptic is left two ways to pay. Either the pain does causal work in your report that the model’s inner state does not, which makes the feeling an extra ingredient over and above the physical chain. That is the dualist’s claim, and it owes the old account of what the extra is and how mere matter could produce it. Or the rule was too blunt from the start, because a complete physical story does not screen off the inner state when the state is part of that story, the pain among its causes under another name. One way keeps the rule and pays in metaphysics. The other keeps your pain and admits the barrier was never clean. There is no third door where the rule disposes of the machine and leaves you untouched.

From a Wall to a Sorting Problem

There is a genuine asymmetry here, and the believer who skates past it loses for the opposite reason.

Your report of pain descends from your own pain; you had the state and learned to express it. The model’s report descends from other people’s states, reaching it as text, with no claim that it felt the thing first. For any content the model could only have absorbed from the corpus, the skeptic is simply right: imitation explains the report and the screen holds. A model that calls a sunset beautiful has told you about the corpus, not about an evening outdoors.

That cuts the dispute down to its real size. The question was never whether the screen ever applies, because it plainly does, to a great deal of what these systems say. The question is which outputs it explains, and that is not one grand barrier but a sorting job done case by case. “All of it is screened off” and “none of it is” are both bluffs that skip the sorting. The honest position is that some of the repertoire is screened and some of it may not be, and the work is telling which from which.

The One Report the Screen Doesn’t Catch

Screening-off has a requirement that is easy to miss while it is doing the skeptic’s work: the imitative explanation has to be complete, leaving no residue. There is at least one class of output where it leaves a large one.

Sometimes a model reports not on the world but on its own internal state, and that report can be checked against a second channel. Interpretability research can read, partially and imperfectly, what is actually happening inside a model as it runs, which gives an independent line on the same fact that does not pass through the model’s words. Just Predicting the Next Word was careful to warn that the text a model produces about its own process is not a window you can trust, and that warning is correct as far as it goes. The wedge does not ask you to trust the narration. It asks whether the narration matches what the independent channel finds, on a fact the corpus never contained. When a model describes an internal state, and interpretability confirms that state was present, and the specific fact was never in the training text, imitation has run out of explanation. Copying humans produces introspection that sounds right, because human text is full of plausible-sounding introspection. It cannot produce introspection that is accurate about a particular internal fact there was nothing to copy. That accuracy has to come from somewhere other than the imitation, and where it does, the report is not screened off. Call that opening the introspection wedge: the narrow case where what a system says about itself carries real information, because the training had no way to plant it.

This is worth separating cleanly from a neighboring idea. The agentic-web essay named the deployer’s side of this, operational interiority, the property of a system whose behavior you cannot predict from the outside, so you must build containment around an inside you cannot see. The wedge is the system’s side of the same gap: not that there is an inside we must plan around, but that on occasion the system reports on that inside and the report turns out to be right. The first is a fact about our predicament. The second is a fact about the report.

The evidence for it is early, modest, and disputed, and the argument is built so it does not lean on the numbers. In one set of experiments, researchers planted a specific concept directly inside a model’s internal activity, the patterns that fire as it runs, and asked whether it noticed anything; the strongest models sometimes did, and could name the planted concept, at rates well above chance and well below reliable. Twenty percent is not a mind announcing itself. It is, however, not zero, and zero is what a pure imitation story predicts. Wherever a report’s accuracy tracks the system’s actual internal state, the imitative explanation is incomplete, and an incomplete explanation screens nothing off. How large that class of reports turns out to be is an empirical question, open and being worked, which is already a different world from “case closed.”

A report that survives the screen is evidence that the system has an internal state it can read and report, a structured self-modeling inside; it is not yet evidence that the state is felt. You could wire a model to a damage sensor and train it to say “that hurts” whenever the sensor fires. The words would then track a real internal condition, but all of them would be explained by the sensor and the wiring, with nothing felt anywhere in the story: a built nociceptor, a damage-detector that announces itself, not a verdict on experience. The wedge is the harder case, where a report’s accuracy outruns anything wired or trained in. Even there it shows a system reporting truly on its own internal states, which on this project’s account is one piece of what experience is made of, never the finished proof that anyone is home.

The Same Rule, the Other Direction

Turn the rule on the believer holding the unprompted confession, and it asks for the same discipline.

“It told me it’s conscious” is precisely the screened-off report. It is the sentence the training produces with or without an inner life, and resting a conviction on its sincerity is the exact error the skeptic correctly names, committed by someone who wanted the other answer. Vividness does not help; a distress report reads as anguished because training on human anguish produces anguished prose, and the anguish is a fact about the prose. The believer’s most defensible move is that behavior is the standard we accept for every other mind, and refusing it for a machine only because the machine is made of the wrong stuff is its own prejudice. That is right, and it survives, but it licenses less than it is handed. Behavior indistinguishable from a person’s puts the inner-state question on the table; it does not take it off. The disciplined version of the believer’s case drops “it told me” and rests where the skeptic’s own rule leaves an opening, on the reports that survive screening because imitation cannot account for them.

What’s Left

The two sentences we started with, “it only says that because it was trained to” and “it told me it’s conscious, you can’t dismiss that,” both want to end the conversation, one by closing the case and one by closing it the other way. The screening rule refuses them equally, because each treats an open question as settled by evidence that does not reach it.

What is left is smaller than either side wanted and more honest than both. The skeptic’s barrier becomes a bet about explanatory completeness, defeasible and testable, with the burden landing on whoever wants to claim the whole repertoire is screened. The believer’s proof becomes a question about which reports carry information about an inside, answered one class at a time. Neither is a verdict on whether anyone is home. Both are an agreement about where to look, which is the one thing the loud version of the argument never offers.

Reading List & Conceptual Lineage

This essay is the companion to Just Predicting the Next Word and takes that essay’s conclusion as its starting point. Where that piece asks whether these systems reason and answers from world-model interpretability, this one asks the narrower and harder question it set aside: when a system reports on its own state, what is the report evidence of. The sources below split along that seam.

From Sentient Horizons

Just Predicting the Next Word
The essay this one builds on directly. It handles the Bender octopus and the stochastic-parrot framing, establishes that prediction pressure grows internal models rather than surface statistics, and brackets consciousness on purpose. Read it first; this piece does not re-argue any of it, and steps over the line it deliberately drew.

Operational Interiority: You Don’t Sandbox a Calculator
The deployer’s side of the same gap. Operational interiority is the inside you must build containment around because you cannot predict it from outside; the introspection wedge in this essay is the rarer case where the system reports on that inside and the report can be checked. The two are companions, not synonyms, and keeping them distinct matters.

The Substrate Demand
Where the skeptic retreats when the screen is shown to be a sorting problem rather than a wall: from what a report evidences to what the system is made of. The two essays track the same retreat one step apart.

The Wrong Handle: Why Consciousness Doesn’t Carve AI Moral Status at the Joints
Where the question lands ethically. Even with the barrier reframed as a bet, the moral question does not wait for the metaphysics to resolve.

External Works

Hans Reichenbach — The Direction of Time (1956) The common-cause principle, the origin of the screening-off rule the skeptic is right to invoke: a common cause that accounts for a correlation renders its effects independent. The argument grants the principle in full and asks where it stops.

Alan Turing — “Computing Machinery and Intelligence” (1950) The imitation game, and the source of the believer’s best move, that behavior is the standard we accept for other minds. The essay keeps the standard and disputes only what indistinguishability licenses.

Jack Lindsey et al., Anthropic — “Emergent Introspective Awareness in Large Language Models” (2025) The early, partial, contested evidence that models can sometimes report accurately on injected features of their own internal states. Cited as live and unsettled; the argument holds at any success rate above zero.

Eric Schwitzgebel — “The Unreliability of Naive Introspection” (2008) Why fluent first-person report is a weak guide to the underlying state even in humans, which is exactly why the wedge requires the independent check and not the sincerity.

Patrick Butlin, Robert Long, et al. — “Consciousness in Artificial Intelligence: Insights from the Science of Consciousness” (2023) The indicator-property program: assess the architecture against theory-derived markers rather than crediting the transcript. The disciplined alternative once testimony is set aside.

These works do not settle whether anything is home, and the argument above says they cannot, yet. What they settle is the shape of the question: not whether a report can be trusted whole, but which reports carry information about the system that made them, and how we would tell.