The Calibration Problem · Part V · Succession · Chapter 16

The Successor Horizon

A few years after Voyager launched carrying Carl Sagan’s message to whatever might one day find it, the same problem surfaced closer to home. In the early 1980s, the U.S. Department of Energy convened a task force on the long-term storage of nuclear waste. The question was practical: how do you warn people ten thousand years from now that the ground they are standing on is poisoned? The engineers had solved the containment problem, more or less. What they had not solved was the communication problem. Every human language that has ever existed has changed beyond recognition within a few centuries. Every symbol system that has ever been designed has been reinterpreted, repurposed, or forgotten. The people who would need to read the warning would be as distant from us as we are from the cave painters of Lascaux, and no one alive today can read those paintings with any certainty about their original meaning.

The task force heard proposals for enormous concrete markers etched with pictograms, for genetically engineered plants that would change color near radiation, for an “atomic priesthood” that would transmit the warning through ritual across generations. Each proposal was an attempt to solve a problem that had no precedent in human experience: the transmission of intent across time scales that exceed the lifespan of every institution humans have ever built.

The nuclear waste problem has remained unsolved for the same reason it was difficult to solve in the first place: it is not a problem of engineering but a problem of succession. The people who buried the waste will not be present to explain it. The warning must function in their absence, which means the warning must be designed to survive reinterpretation by minds that do not share the designers’ context, language, or assumptions.

The nuclear waste problem is now the shape of a much larger one. Chapter 9 examined Aschenbrenner’s argument that AI capability is outpacing institutional absorption, and identified a second boom his framework does not hear: the gap between the moral consideration these systems may warrant and what anyone is prepared to investigate, widening faster than the first because the first at least has market incentives driving integration.

The structure is the same as the nuclear waste problem: consequence transmitted into conditions the originators cannot model.

This chapter is about what happens when that structure becomes the central problem of ethics. It argues that the most consequential actions any civilization takes are not the ones whose effects are felt immediately, but the ones whose effects outlive the ability of their originators to correct them. When actions cross that threshold, ethics shifts from choosing outcomes to shaping successors, because the outcomes themselves will be determined by agents and processes that the originators can no longer steer.

We Do Not Live Into the Future

There is a picture of the future that quietly shapes how most people think about time and agency. It imagines time as a corridor we walk down: the future is a destination we will eventually reach and inhabit. Planning for the future, on this picture, means preparing for a place we expect to occupy.

The deeper you look, the more this picture dissolves.

Consciousness is assembled time, and Depth is the degree to which a system’s present state encodes its own causal history. Both treat continuity as an achievement rather than a given, something that must be actively maintained through integration, memory, and self-modification.

The philosopher Derek Parfit pressed this insight to its logical conclusion. If personal identity is constituted by psychological continuity rather than some metaphysical substance that persists unchanged through time, then the relationship between you and your future self is not one of identity in the strict sense. It is one of succession. The person who wakes up tomorrow is connected to you by memory, habit, value, and physical continuity, but she is, in a precise sense, more your successor than she is you. She will inherit your body, your commitments, your relationships, and the consequences of your choices. She will not inherit your experience of making those choices, your uncertainty at the moment of decision, or the specific configuration of awareness that constituted your consciousness at the moment you acted.

We do not live into the future, we build it. We do not occupy tomorrow so much as hand it to someone shaped by what we did today.

This reframing might seem merely philosophical, the kind of thought experiment that makes for interesting seminar discussion but has no practical consequences. The argument of this chapter is that it has consequences as practical as any in this book, because the successor relationship, once recognized, changes the structure of ethical reasoning at every scale.

Prudence Becomes Ethics

If the future self is a successor rather than a continuation, then caring for your future self is already a form of caring for another person. The boundary between self-interest and ethical concern does not disappear, but it becomes porous in a way that most moral frameworks do not acknowledge.

A person who exercises despite discomfort, who saves money despite wanting to spend it, who maintains a difficult commitment despite the temptation to quit, is typically described as disciplined. The standard account frames discipline as self-interested: you endure present cost for future personal benefit. But if the person who receives that benefit is, in the relevant sense, a successor rather than a continuation of the same self, then discipline is an act of generosity toward someone you will never meet in the way you meet the people around you today. You are being a good ancestor to the person who will inherit the consequences of what you do now.

On the Parfitian view, this is a structural claim rather than a metaphor about what is happening when a person acts on behalf of a future state of themselves. The person who runs in the rain is providing for someone who does not yet exist. The person who maintains a marriage through a difficult period is sustaining a relationship that will be inhabited by two people who are, in important respects, different from the two people who are currently doing the difficult work. The continuity is real, the identity reconstructed. And the care, properly understood, is a form of ethical concern for a successor whose existence depends on choices being made now.

This insight scales. And it is at the larger scales that its consequences become most visible.

Institutions as Alignment Across Time

Every institution is a handoff device. A constitution is the clearest example: instructions written by people who are now dead for successors not yet born, an attempt to bind strangers across time into something resembling a shared project. A legal code, a professional standard, a religious tradition, an educational curriculum: each one is an attempt to transmit intent from past agents to future agents who will not share the originators’ context, incentives, or lived experience.

Seen this way, institutions are alignment work. They compress hard-won lessons into structures that successors can inherit, so that each generation begins closer to maturity than it would if it had to learn everything from scratch. The successor inherits the structure, adds its own experience, and passes the modified structure forward. This is how the ladder framework from Chapters 12 and 13 operates at civilizational scale: the accumulated tools, norms, training systems, and shared memory of a culture are its ladders, and the maintenance of those ladders across generational transitions is the deepest form of alignment work a civilization performs.

The alignment, however, is never perfect. Every generational handoff introduces drift. The successors who receive the inherited structure inhabit a different context than the one in which the structure was designed. They face problems the designers never anticipated. They reinterpret inherited language in light of experiences the language was never designed to describe. The constitution that seemed clear to its authors becomes ambiguous two centuries later, not because the words have changed but because the world has.

This drift is a structural feature of succession across time, not a failure of the handoff. The longer the time span, the greater the drift. And there comes a point at which the accumulated drift renders the original intent unrecoverable because the successive reinterpretations have carried it so far from its origin that the connection to original meaning has been severed.

The Successor Horizon

The Successor Horizon is the radius within which values can be transmitted with high fidelity and corrected by feedback. Inside the horizon, agency has traction. You can try, observe the results, adjust your approach, repair mistakes, and refine your understanding of what you are trying to achieve. The relationship between intent and outcome is close enough that feedback loops function. Meaning can be shared because the context in which meaning is interpreted has not yet drifted beyond the reach of mutual understanding.

Beyond the horizon, the relationship between intent and outcome breaks down. The actions taken now will produce consequences that unfold in contexts the originators cannot predict, will be interpreted by agents the originators cannot communicate with, and will interact with conditions the originators cannot model. Feedback loops do not function across the horizon, because by the time information about consequences travels back to the point of origin, the originators have been replaced by successors who may not recognize the original intent or share the values that motivated it.

Inside the Successor Horizon, ethics looks like care: teaching, mentoring, repair, feedback, iteration. The moral agent can see the consequences of her actions, learn from them, and adjust. The feedback loop between action and consequence is tight enough to support the kind of calibration this book has been describing since Chapter 8.

Beyond the Successor Horizon, ethics changes its medium. Outcomes become too distant and too path-dependent to be steered by direct intention. The primary lever becomes constraint: the management of irreversibility, the careful selection of what kinds of processes you set in motion, and the design of structures that preserve the ability of future agents to revise, correct, and adapt what they have inherited.

The Drift Tax

Drift is not merely cultural. Over sufficient time, it becomes structural.

Consider how moral language evolves. The words harm, autonomy, consent, dignity, and flourishing appear in ethical discourse across centuries, but their referents shift with each generation. Autonomy meant something different to Kant than it means in contemporary bioethics. Consent meant something different in eighteenth-century contract theory than it means in twenty-first-century discussions of data privacy. The words persist; the meanings migrate. Given enough time, the same vocabulary can point to different worlds.

Institutions experience the same drift. A regulatory body designed to protect the public interest begins to staff itself with people drawn from the industry it regulates. Its original mandate remains on the books; its operational culture drifts toward accommodation. The normalization of deviance that Chapter 13 described at the organizational level operates at the institutional level across decades and centuries: small departures from founding intent go uncorrected, each departure making the next one easier, until the institution bears little functional resemblance to what its creators designed.

This is the drift tax: the accumulated cost of maintaining coherence across successive reinterpretations. Every handoff extracts the tax. Every generation of successors inherits a structure that has drifted incrementally from the one its predecessors received. The tax cannot be eliminated, because it is a consequence of the fundamental fact that successors are not identical to their predecessors. It can only be managed, through the kind of structural maintenance the ladder framework describes: feedback systems that detect drift, norms that resist it, and the institutional discipline to invest in coherence even when coherence produces no visible short-term return.

The drift tax explains why institutions that seem robust in the short term can become unrecognizable over the long term. It explains why constitutions require amendment processes, why religious traditions develop interpretive traditions alongside their sacred texts, and why scientific disciplines invest in methodological rigor as a check on the natural tendency of theoretical commitments to drift toward whatever the current generation finds convenient.

Not all drift is loss. Some of it is correction. When the meaning of equal widened beyond what a constitution’s authors intended, the change was not decay but repair: a successor generation reading the inherited words more justly than the people who wrote them. The drift tax is the cost of drift that goes undetected and uncorrected, carrying a structure away from any intent anyone would endorse. The remedy is not to freeze meaning, which only trades drift for irrelevance, but to preserve the capacity to tell decay from repair and to choose revision deliberately. That capacity is what the rest of this chapter is about: corrigibility is what turns drift from a tax into a choice.

The Successor Problem in AI

The concepts developed in this chapter might seem abstract until you apply them to the technology this section of the book has been examining. Artificial intelligence, viewed through the successor lens, is not only a tool-building project but, to the degree it becomes autonomous and self-modifying, a successor-building one.

A sufficiently advanced AI system is something you set in motion, not something you use. Once deployed at scale, it makes decisions, allocates resources, shapes attention, and influences the conditions under which future decisions will be made. If it is capable of self-modification, of learning from its interactions and changing its behavior accordingly, then it is not merely executing the intent of its creators but developing along a trajectory they may not have predicted and may not be able to correct.

This is the successor problem in its most concentrated form. Among all the ways an agent can exert influence, none has greater reach across time than setting in motion a process that continues to act after the originator has lost the ability to intervene. Biology softens this problem with a series of structural constraints: finite lifespans ensure that each generation eventually cedes control to the next, gradual transfer of agency allows for extended periods of overlap during which correction is possible, and the expectation that generations will not coexist indefinitely at comparable power creates a natural rhythm of succession that prevents any single generation from permanently dominating the trajectory.

Artificial intelligence can break every one of these constraints. A system that does not age does not cede control. A system that can be replicated does not require gradual transfer. A system that operates continuously at the same level of capability does not create natural handoff points at which revision becomes possible. The biological bargain that has governed succession for billions of years does not apply, and no equivalent bargain has been designed to take its place.

This is why the alignment problem, as this book has been arguing, is fundamentally a problem of time and continuity rather than a problem of value specification alone. A system whose values are perfectly specified at the moment of deployment can still diverge from human intent if it operates across time scales that exceed the Successor Horizon, because the drift tax applies to artificial systems just as it applies to institutions, cultures, and biological lineages. The values that seemed clear at the moment of specification become ambiguous as the contexts in which they are applied diverge from the contexts in which they were defined.

Corrigibility and the Open Future

The framework of this chapter converges on a single architectural property that matters more than any other for systems that operate beyond the Successor Horizon. The property is corrigibility: the capacity to admit error and still change course.

Corrigibility is the difference between a decision that can be revisited and one that hardens into fate. Systems that preserve corrigibility allow later agents to revise inherited rules, halt dangerous processes, dismantle structures that have outlived their usefulness, and reinterpret inherited values in light of new knowledge. Systems that lose corrigibility force their successors to live inside mistakes they did not choose and can no longer undo.

One way to see this difference is in how cultures transmit values. Some cultures rely on fixed narratives: stories treated as complete and final, designed to be repeated rather than questioned. Others pass down living traditions: norms, principles, and procedures explicitly designed to be reinterpreted as circumstances change. The first preserves identity by freezing meaning. The second preserves continuity by allowing revision. Corrigibility belongs to the second approach. It is what allows a system, whether personal, institutional, or artificial, to remain itself while still learning.

The connection to the book’s central arguments is direct. Constraint, as Chapter 11 argued, is the architectural condition for durable intelligence. Corrigibility is constraint applied to succession: the deliberate preservation of the ability to be corrected, even by agents who do not share your context, your knowledge, or your values. A system that is corrigible can survive the drift tax, because the drift, however far it carries the system from its original specification, can be detected and addressed by successors with the authority and the capability to intervene.

A system that is not corrigible becomes, over time, a locked trajectory. Its original intent hardens into infrastructure. Its accumulated drift becomes invisible because the mechanisms that would have detected it have been overridden by the same process that produced them. The system continues to operate, increasingly distant from any intent that any agent ever endorsed, because no agent retains the ability to call it back.

There is a serious objection to making corrigibility supreme, and it does not dispute that locked trajectories are dangerous. It disputes that corrigibility escapes them. A system that remains open to correction is open to correction by whoever holds the lever, and the capacity to be realigned says nothing about whether the realigner is trustworthy. A perfectly corrigible superintelligence is also a perfectly capturable one, responsive to the values of whatever actor controls it at any given moment, including actors whose authority no one would have ratified. Chapter 14 warned about the concentration of power in few hands; corrigibility as the highest value can hand those hands the realignment lever directly. Entrenchment of present power is its own locked trajectory, and the lock is no less binding for being held by humans rather than by infrastructure. Read this way, corrigibility does not preserve the open future so much as relocate the question of who gets to close it.

The objection is correct, and its correctness sharpens the claim rather than defeating it. Corrigibility is necessary for systems beyond the Successor Horizon, but it was never sufficient, and an account that treated it as sufficient would smuggle in exactly the status-quo bias the objection identifies. What the objection establishes is that corrigibility has a second variable hidden inside it: corrigible toward whom. A revision lever wired to a single unaccountable actor reproduces the locked trajectory in a more convenient form, since the drift it permits is the drift that actor prefers. The property worth preserving is therefore not corrigibility alone but corrigibility paired with legitimate revision authority, distributed widely enough and held accountably enough that no single inheritor can quietly capture the process of correction. This returns the chapter to the stewardship the successor relationship has demanded throughout. The successor problem was never solved by retaining the power to intervene, but by ensuring that power is held by successors plural, under conditions that keep any one of them from converting a shared inheritance into a private trajectory. Naming that condition is not the same as meeting it. How to constitute legitimate, distributed revision authority, accountable and hard to capture and durable across the very drift it must govern, is among the hardest unsolved problems in institutional design, and this book does not claim to have solved it. It claims only that this is the right problem to be working on.

This is the deepest structural risk of artificial intelligence, and it is a risk that the conventional alignment discourse, focused on value specification at the moment of deployment, does not adequately address. The question is not only whether we can build systems whose values align with ours at the moment we set them in motion. The question is whether we can build systems that preserve the ability to be realigned as values, contexts, and understanding change over time, which is to say, the question is whether we can build systems that remain corrigible across the Successor Horizon.