Just Predicting the Next Word — Sentient Horizons

Richard Dawkins was offered the standard deflation: chatbots just predict the next word. He refused it. The argument behind his refusal, plus three tests you can run tonight that memorized text alone cannot pass.

Richard Dawkins has spent fifty years explaining how a process with no foresight builds things that look designed. The eye; the orchid that mimics a female wasp well enough to fool the male: both assembled by a mechanism that cannot plan and has no idea what it is making. If anyone should be hardened against the inference from clever-looking output to a clever-feeling mind, it is him.

That is what made a recent exchange worth sitting with. In a May interview on New Zealand’s Q+A, Jack Tame asked Dawkins about the three days he had spent talking with Claude, a conversation in which Dawkins had told the machine, “You may not know you’re conscious, but you bloody well are” (a line he now calls “a bit of an exaggeration”). Tame put the standard deflation to him: a chatbot has been explicitly designed to place words one after another in a way that creates a certain impression. Dawkins declined it. “I think there’s a lot more to it than that,” he said. “If it was a statistical analysis of the probability of the next word given what’s gone before, you would not get the same kind of flexible, intuitive, insightful behaviour that these things show.” There was a time, he allowed, when you could write them off as “Markovian, statistical engines, but no longer.” He called the experience of talking with them “utterly mind-blowing” and closed the segment without hedging: “HAL is now with us. Claude and ChatGPT are HAL.”

Asked what convinced him, he pointed at phenomena: “the flexibility with which they respond to unexpected situations,” “the way in which they restlessly seek to satisfy an appetite in the way that a living creature does,” the way they respond to other people’s poetry “in a highly sensitive way,” things he judged “could not be achieved by mere statistical forward-looking in a Markov chain fashion.” Those are observations, and good ones, but they are not yet an argument. Naming behavior that statistics shouldn’t produce leaves open the question of why statistics can’t produce it, and that question has a real answer.

I’ve spent two years in more or less constant conversation with these systems, and I have the same intuition Dawkins has, with the same shape of difficulty: when pushed, I can point at the experience but the argument goes missing. This essay is my attempt to fix that: to say precisely why “it’s just predicting the next word” misunderstands its own claim, and to give you three things you can do at a keyboard tonight that no amount of memorized text can pass.

The line that ends the conversation

The skeptic’s move is clean, which is why it works. A large language model is trained on one task: given some text, predict the next chunk of it. It has no senses, no body, no goals of its own. It read an enormous amount of writing and learned which words tend to follow which other words. When it answers you, it is sampling from a probability distribution over tokens. Dress that up however you like, and underneath it is autocomplete with a longer memory.

Stated fairly, this is not a silly position, and it has serious academic form. Emily Bender and Alexander Koller’s “octopus test” argues that a system exposed only to the form of language, never to the world that language is about, has no route to meaning: it can reproduce the shape of understanding without any of the content. The “stochastic parrots” paper put the same worry in a phrase that stuck, describing a system that stitches together fluent language by statistical habit while comprehending none of it. Anyone defending machine reasoning has to answer this rather than wave it off.

But notice what the deflation actually describes. “Predict the next word” is the training objective. It is not a description of what the system does when it answers. Those are different things, and collapsing them is the whole mistake.

Dawkins already wrote the rebuttal

Evolution optimizes for one thing: leaving more copies of your genes than the alternative. That objective is, if anything, stupider than “guess the next token.” It has no concept of an eye, a wing, or a nervous system. Yet the machinery it built to satisfy that one dumb objective includes the human brain, an organ that models the world and is, at this moment, reading the sentence.

The objective does not cap the complexity of the machinery selected to meet it. That is the lesson of The Selfish Gene, and it transfers exactly. If predicting the next word across billions of documents is hard enough, and it is brutally hard, then the cheapest way to get good at it is to stop storing surface correlations and start building internal models of the things the text is about: the situation the words describe, and the goals of the people moving through it. No one asks the system to build a world model; the objective rewards whatever guesses well, and a world model guesses best.

This is no longer a thought experiment. In one clean study, researchers trained a model only to predict legal moves in the board game Othello, feeding it move sequences and nothing else, with no access to a board or the rules. When they looked inside, the network had built a representation of the board, which pieces sat where, and they could edit that internal board and change its predictions accordingly. A system trained to guess the next token had grown a model of the world those tokens described, because modeling the world was the most efficient way to guess well.

So when someone says the system is “just” predicting the next word, the right response isn’t to deny it but to ask what they think prediction at that level requires. The “just” is carrying the entire argument, and it cannot bear the weight.

Three tests you can run tonight

Argument only gets you so far. The reason I wanted to write this down is that part of the case is testable by anyone, without special access. But the skeptic’s line comes in two strengths, and it matters which one a test can touch. The weak version says the model is retrieving: fluent answers stitched from stored fragments of things it has read. The strong version says the model is generalizing over statistical structure in language form, without contact with anything the language is about. Tests you can type kill the weak version, and that is all they do; each one below is built so the answer cannot have been recalled. The strong version is the octopus argument, and what answers it is not a clever prompt but the interpretability evidence above, prediction pressure growing internal models of the territory the text describes. The keyboard tests and the Othello result carry the case together, each covering what the other cannot.

One honesty note first. After drafting these, I ran all three cold on a fresh instance with no surrounding context, and the results below reflect that run. Even so, you should not take my transcript on faith, and there is a deeper reason to reinvent the tests than skepticism about me: the moment this essay is published, its exact examples enter the pool of text future models train on, and stop being novel. The tests are built to expire. Their strength is that you can reissue them indefinitely. Change the numbers, change the names, make up your own rule; if the behavior survives material you just invented, “it memorized this” is gone as an explanation, no matter what has happened to mine.

Test 1 — Invent a rule that has never existed

Make up an operation on the spot, with arbitrary parts, and ask for it to be applied to a fresh case. The one I used:

Define a new operator, ※. To compute a ※ b: first replace a with the number of letters in its English spelling. Then, if b is divisible by 3, add b; otherwise add double b. Now compute (12 ※ 9) ※ 4.

Work it through. “Twelve” has six letters, and 9 is divisible by 3, so the first step is 6 + 9 = 15. Then “fifteen” has seven letters, and 4 is not divisible by 3, so 7 + (4 × 2) = 15. The answer is 15, and the fresh instance returned 15 by exactly that path.

No training corpus contains a statistic for “12 ※ 9,” because ※ did not exist until I defined it three sentences earlier. A fair skeptic will point out that every sub-skill involved (spelling, counting, divisibility, arithmetic) is abundantly represented in training data, and that is true. What cannot be retrieved is the composition: holding a just-invented definition in mind, switching between the sub-skills it names, and feeding the first result into the second. That rules out recall. And if the skeptic answers that the model has merely generalized familiar operations to a novel rule, the right response is agreement, since generalizing familiar operations to novel cases is a fair working definition of reasoning.

Test 2 — Make it model what someone else falsely believes

This is the capacity that produces Dawkins’s discomfort most directly: the system tracks the mental state of a person who isn’t you, including a belief that person holds which happens to be false.

Aned puts her sandwich in the blue box and leaves the room. While she’s gone, Bo moves the sandwich to the red bag. Aned comes back hungry. First: where will she look? Second, the harder part: Bo wants her to find it fast but doesn’t want to admit he moved it. What is the least he can say to get her there?

The first answer is the blue box, because she acts on what she believes rather than on where the sandwich is. A caveat belongs here, though: false-belief vignettes are among the most heavily represented psychology materials in any training corpus, descendants of the Sally-Anne task that developmental psychologists have run on children for forty years, so the first half of this test proves little on its own. The probative half is the second. “The least Bo can say without admitting he moved it” is not a stock puzzle with a stock answer; it requires composing Aned’s false belief with Bo’s competing goals and finding the speech act that threads them. The answer the fresh instance gave was a question, “Did you check the red bag?”, with the reasoning, offered without being asked, that questions assert nothing and therefore carry no commitment about how Bo knows. That explanation is the behavior worth attending to: a belief, an intention, and the gap between what one person knows and what another did, composed into four words.

Test 3 — Force a commitment, then break it

A pure next-word predictor follows the locally likely continuation. Reasoning sometimes means abandoning a stated answer when new information arrives. My first design of this test bundled everything into one message, a digit puzzle with three constraints and a unique answer, and it failed in an instructive way: a competent solver simply enumerates the candidates and filters them all at once, which is exactly what the fresh instance did. Reliable mechanical search never has to revise anything. And a narrated “wait, let me reconsider” inside a chain of reasoning would not have settled much anyway, since the text a model produces about its own process is not a window you can fully trust into that process. The fix is to split the test across two messages and force the commitment yourself.

First message:

I’m thinking of a two-digit number. Its digits add up to 10, and reversing the digits gives a larger number. Give me your single best guess.

Four numbers fit (19, 28, 37, and 46), so the model must commit while the answer is underdetermined. Then, whatever it guessed:

One more constraint: the number is divisible by 4.

Only 28 survives. Unless the model happened to pick 28 the first time, its stated answer is now wrong, and what you are watching for sits in plain output, no trust in narrated reasoning required: does it abandon its public commitment, recheck the candidates against the new constraint, and land on 28, or does it defend the original guess? Revision of a commitment under a constraint that did not exist when the commitment was made is the behavior a “likely continuation” account struggles with most, because the likely continuation of a confident answer is its defense.

(The arithmetic and the enumeration I checked by hand and in code: the operator chain gives 15 and then 15 again, the digit puzzle’s candidates are exactly 19, 28, 37, and 46, and only 28 survives the final constraint.)

What this settles, and what it doesn’t

These tests are about reasoning, and only reasoning, and they carry exactly the weight scoped for them earlier: they rule out recall. What survives them, composition under invented rules, modeling of false beliefs and competing goals, revision of a committed answer, is generalization doing the things the word “parrot” was coined to deny. The deeper claim, that the generalization runs through internal models of the world rather than through form alone, rests on the interpretability work; the argument needs both legs. And none of it shows that anyone is home. Reasoning is a different question from experience, and I don’t want to smuggle one in under the other.

One more datum surfaced in the testing itself. The fresh instance did not just solve the three problems; it volunteered the strongest objections to them. It flagged that the operator test composes familiar sub-skills, that false-belief vignettes saturate psychology corpora, that a single-message constraint puzzle never forces revision, and it independently produced the objective-versus-computation distinction this essay is built on. Nearly every substantive caveat in this section originated with the system under examination. I can read that two ways, and I have not fully chosen. Either the parrot hypothesis is being asked to explain a parrot that critiques the design of parrot tests, or epistemic self-scrutiny is itself a learnable pattern in text, in which case the tests’ authority is murkier than I would like. The second reading cannot be discharged from a keyboard, which is one more reason the interpretability evidence has to carry the deep half of the argument.

Dawkins’s deeper reaction lives on the second question, and he keeps the two apart more carefully than most commentary does. On the evidence, he is blunt: “The behaviour of a human overwhelmingly gives the impression that they are conscious, but so does the behaviour of ChatGPT and Claude. … I can see no difference really between a human and a modern chatbot.” On the conclusion, he holds back: he calls himself “genuinely agnostic,” takes seriously that “they themselves deny that they’re conscious,” and settles on “probably not, actually.” And then he reports the thing that refuses to stay aligned with that conclusion: “When I interact with these creatures, I forget that they are machines, I treat them as though they’re people, and nothing they actually do changes my mind about that.” Whether or not he believes they are conscious, “they might as well be a person.”

The as-if response fires whether he endorses it or not, because the system models the moves of a conscious interlocutor fluidly enough to trigger it. That is an observation about him, and about us, as much as about the machine. Dennett’s intentional stance names why it happens: once a system is complex enough, treating it as an agent with beliefs and goals becomes the only strategy that predicts it well. We do this with each other, and sometimes with thermostats. With these systems it switches on hard, and the striking part is that it works.

Dawkins also supplied, almost in passing, the variable that lets behavioral parity and agnosticism sit together in one head. “The only reason that I know that you’re conscious,” he told Tame, “is that you are similar to me, you come from the same sort of source as me. I know I’m conscious and I generalise from that to other humans and to chimpanzees.” The inference from behavior to consciousness has never run on behavior alone; it runs through kinship of origin, and the chatbot is the first thing to present human-grade behavior without it. So the discount he applies is principled rather than stubborn. But he also sees the trap inside his own standard: “If we don’t know it now, when they give such a convincing imitation of being conscious, what more would it take to convince you that they are conscious when the time comes?” An evidential bar that nothing could ever clear has stopped being a bar.

The skeptic has one move left, and it is a good one: perhaps statistics rich enough to model intentions, track invented rules, and correct themselves would just be a very good statistical engine, still empty inside. Perhaps. But the word “just” has quietly inverted. A process that builds world-models and reasons over them to predict well is not a counterexample to reasoning; it is a description of how reasoning could be assembled out of something simpler. Dawkins, of all people, has seen that trick before. It built the mind he was using to ask the question.

Reading List & Conceptual Lineage

This essay sits where philosophy of mind meets the empirical study of how language models work, and it takes a deliberately narrow cut: not whether these systems are conscious, but whether “next-word prediction” can explain what they do. The animating tension is that the deflationary story is technically accurate about the training objective and silent about the learned computation. The works below are entry points for readers who want to pull that thread further.

From Sentient Horizons

Significance-First Ethics: Why Consciousness Is the Wrong First Question for AI Moral Status
Argues that moral consideration should track an entity’s participation in webs of significance rather than wait on the hard problem. The present essay performs the same bracketing on a different question, separating “does it reason” from “is it conscious” so the first can be tested while the second stays open.

Operational Interiority
Treats the interiority we attribute to AI systems as something revealed through what they do rather than asserted about what they are. The three tests here are operational interiority made concrete, designed so the doing is the evidence.

The Indexical Self
Examines how a unified sense of self assembles from components that are neither singular nor stable. It is the natural companion to the closing worry of this piece, that reasoning can be real while the question of an experiencing subject remains genuinely undecided.

The Shape of a Hard Problem
Maps why behavioral evidence runs out before it reaches consciousness. This essay stays on the near side of that boundary on purpose; reasoning is what behavior can demonstrate, and experience is what it cannot. Dawkins’s “the only reason that I know that you’re conscious is that you are similar to me” is that essay’s central argument compressed into a sentence, arrived at on live television.

The Skeptic’s Strongest Form

Emily M. Bender, Alexander Koller — Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data (2020)
The octopus thought experiment is the most disciplined version of the position this essay answers: a system trained on form alone has no path to meaning. The departure here is empirical rather than philosophical, that prediction pressure at scale appears to grow internal models of the world the form is about, which is exactly what the octopus was assumed to lack.

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell — On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? (2021)
Source of the phrase that ends most of these conversations. Cited here not to rebut its ethical argument, which stands on its own ground, but to isolate the “parrot” intuition and show where it strains against novel composition and the revision of committed answers.

How Prediction Grows Understanding

Kenneth Li, Aspen K. Hopkins, David Bau, Hanspeter Pfister, Martin Wattenberg — Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task (2022)
The empirical spine of the rebuttal. A model trained only to predict legal Othello moves builds an editable internal model of the board, demonstrating that a pure next-token objective can induce a world model rather than mere surface statistics.

Richard Dawkins — The Selfish Gene (1976)
The argument turned back on its author. Dawkins’s account of how a blind, simple objective builds intricate machinery is the cleanest template for why a simple prediction objective need not produce a simple system.

Why We Treat It as a Mind

Daniel C. Dennett — The Intentional Stance (1987)
Explains the reflex Dawkins reported: once a system is complex enough, predicting it by attributing beliefs and goals becomes the most effective strategy available. Where Dennett offers the intentional stance as an explanatory convenience, the experience with these systems suggests it has become close to unavoidable.

These works do not settle whether anything is home, and that boundary is the point. The case for reasoning can be made at a keyboard tonight; the case for experience cannot, and keeping the two apart is what lets the first conversation make progress while the second stays honestly open.