The Calibration Problem

Specification Is Governance

As AI drives the cost of execution toward zero, power shifts upstream into the rules that machines enforce. Those “checklists” look neutral, yet they encode values, tradeoffs, and hidden assumptions. At scale, specification becomes governance, and calibration becomes the bottleneck.

John Fredrickson

24 Feb 2026 — 9 min read

Specification is Governance

The specification economy has a moral hazard nobody’s naming.

A customer writes in furious and exhausted. Their refund gets denied in twelve seconds, with impeccable formatting and a cheerful explanation that cites policy. The denial is consistent across thousands of cases. When the appeal arrives, it is routed into the same system, evaluated by the same criteria, and closed with the same calm certainty. Nobody on the inside feels cruel. The people who built it feel proud. The numbers look clean. Yet the harm is real, and it travels at machine speed.

That is the new shape of ordinary power. It arrives as a checklist for machines, the small, seemingly neutral rules that decide what counts as success.

Software teams call those rules acceptance criteria. Hospitals call them triage protocols. Lenders call them underwriting standards. Platforms call them moderation policy. Governments call them eligibility requirements. The labels change. The structure does not. Someone defines an outcome, translates that definition into machine legible constraints, and then lets those constraints act at scale.

Nate B. Jones recently published a video arguing that the job market is splitting in a way most people aren’t tracking. As AI collapses the cost of execution in software and knowledge work toward zero, the bottleneck shifts. Projects fail less from bad engineering and more from vague or incorrect specifications. Value migrates upstream into the ability to specify what should be built. A new class of high leverage workers, Jones calls them token drivers, architect systems and manage agent fleets to achieve massive scale. Everyone else sees their role commoditized.

The analysis is sharp, and the economic observation is largely correct. It also stops exactly where the harder problem begins.

Jones frames the shift as an adaptation challenge. Learn to write precise specifications, think in systems, produce verifiable outputs. Master this new upstream work and you win. The framing misses the structural change that arrives with scale. Once your instructions can run across thousands of decisions without review, specification becomes governance. The question stops being only whether you wrote clearly. It becomes what you smuggled into the rules, and how quickly those assumptions now propagate through the world.

What Jones Gets Right

The shift Jones describes is real, and the implications are significant. For most of the history of software, the bottleneck was execution. You needed people who could build the thing. Engineers were expensive, development cycles were long, and the gap between having an idea and having a working system was measured in months and headcount. That gap enforced friction. Bad ideas died in implementation. Vague intentions got sharpened by contact with technical reality.

AI is collapsing that gap. The cost of execution trends toward zero. A specification that once required a team and six weeks can now be handed to an agent fleet and running by Thursday. The friction that once filtered bad ideas is thinning.

This means the bottleneck moves. When execution is cheap, the scarce resource becomes clarity of intent. Organizations fail less because they cannot build what they specified and more because they specified the wrong thing, or specified the right thing loosely enough that the agents built a confident, well-formed version of a mistake. Jones is right that this creates a new class of high value work. The people who can translate intent into precise, testable, executable instructions sit at a leverage point that barely existed five years ago.

He is also right about the direction of travel beyond software. More and more domains are being restructured into machine readable decision rules. Professional judgment becomes rubrics. Nuance becomes criteria. Exceptions become edge cases. The engineering mental model spreads because it travels well in agentic environments. If you can automate execution, someone will.

This is a genuine structural shift. The adaptation Jones calls for is real and necessary. The missing half of the picture emerges once you treat specifications as the governance layer of scaled systems.

The Compression Engine

The specification economy rewards people who can specify quickly, confidently, and completely. That reward structure sounds benign, even obviously good. Clarity is better than vagueness. Testability is better than vibes. Verifiable outputs are better than outputs nobody can evaluate.

But look at what the reward structure actually selects for. It selects for people who can encode their mental model of a problem fast, under deadline, with enough confidence to hand it to a system that will execute without asking follow-up questions. It selects for people who've internalized one environment's assumptions deeply enough to compress them into acceptance criteria without pausing to examine them. It rewards decisiveness over deliberation. Legibility over truth.

That is a compression engine.

In environments where compression operates, judgment doesn't just get faster. It gets different. It becomes brittle. It becomes less able to notice when its own assumptions no longer fit the situation it's being applied to. The moral heuristics that work beautifully in one context, one organizational culture, one user population, one set of incentives, get encoded into specifications that run in different contexts, at machine speed, without the friction that would normally surface the mismatch.

The people best positioned to win in Jones's framework are precisely the people whose compressed intuitions get the most leverage. They've mastered execution-speed thinking. They're fluent in agentic mental models. They can generate well-formed specifications under pressure. And when their values are miscalibrated, when their mental model of the user is slightly wrong, when their assumptions about context don't hold, when the environment they're specifying for differs from the environment they've internalized, that miscalibration executes at scale before anyone notices.

A token driver who can generate beautiful specifications under pressure therefore sits on a peculiar kind of leverage. They do not merely accelerate execution. They accelerate their own unexamined assumptions.

This is where a distinctive failure mode emerges. Call it values laundering.

Values laundering happens when the spec looks neutral because it is technical, and the values ride along inside the choice of objective functions, proxies, and constraints. "Optimize for customer happiness." "Reduce fraud." "Increase quality." "Maximize retention." These phrases feel like engineering. They are moral and political decisions in disguise because they determine who bears the cost of optimization, whose edge cases get treated as noise, and which harms count as acceptable collateral.

That is a calibration problem. It presents as a specification problem because it wears the clothes of technical rigor.

The bottleneck will not be solved by teaching more people to write better decision rules. It will be solved by building norms and structures that calibrate the people who write them.

The Governance Threshold

There’s a threshold most people writing specifications don’t know they’ve crossed.

It’s not marked by a job title change or a policy memo. It doesn’t arrive as a formal responsibility. It arrives quietly, as a consequence of scale. When your specifications run at machine speed, across thousands of interactions, without human review at each step, something has changed about the nature of what you’re doing. You are no longer using a tool. You are setting policy. Your criteria are the enforcement mechanism. Your intent is the governance layer.

This is what it means to say specification is governance. The claim is structural.

Consider what governance actually is. It’s the set of rules, constraints, and decision procedures that determine how a system behaves when nobody is watching each individual action. Good governance doesn’t require a human in the loop at every moment. It requires that the values and assumptions embedded in the structure are sound enough to produce acceptable outcomes across the range of situations the system will encounter. That’s exactly what a specification does for an agent fleet. The difference between a policy document and an acceptance criteria document is increasingly one of format, not function.

The threshold becomes visible when you map a few indicators. How many actions will execute from this specification before a human reviews the outputs? How quickly do errors propagate relative to how quickly feedback arrives? How reversible are the downstream effects? How far is the distance between the person writing the specification and the people affected by its execution? When those numbers cross certain levels, the moral weight of the work changes qualitatively, not just quantitatively. A specification that governs ten interactions is a tool. A specification that governs ten thousand interactions before anyone checks is something closer to institutional policy.

Most people crossing this threshold don’t know they’ve crossed it because nothing in the current conversation names it for them. Jones’s framework is organized around value and leverage: who captures the most economic upside in the new structure. That’s the right question for career strategy. It’s the wrong question for understanding what you’re now responsible for. The token driver optimizing for scale has become, whether they intended to or not, a governance actor. The obligations that follow from that aren’t optional. They just haven’t been named yet.

What Governance Actually Requires

Naming the threshold is not the same as knowing what to do about it. So what does governance actually require from the person writing the specification?

Not more technical rigor. Technical rigor answers whether the system did what was specified. Governance asks whether what was specified should be allowed to run at scale. The specification economy is already pushing hard in that direction, and Jones is right that testability and verifiability matter. But technical rigor is a property of the specification after it’s written. The calibration problem lives upstream, in the moment before the first acceptance criterion gets drafted, in the assumptions the writer brings to the document, the mental model of the user they’ve never made explicit, the values they’re optimizing for without having examined whether those values hold in the environment the agents will actually run in.

Governance requires that those assumptions surface before execution, not after.

This is a different kind of discipline than specification mastery. It’s closer to what high-stakes institutions do when they build pre-commitment structures into decision-making. Aviation doesn’t trust that pilots will remember the right procedure under pressure. It builds checklists that externalize the procedure before pressure arrives. Medicine doesn’t trust that surgeons will spontaneously surface every assumption about a patient’s condition. It builds pre-operative protocols that force those assumptions into the open when there’s still time to catch them.

The specification moment deserves the same treatment. Before writing acceptance criteria for any system that will execute at scale without human review at each step, the questions that need answering are not primarily technical. What is my mental model of the people this will affect, and where did that model come from? What values am I encoding, and have I examined whether they hold across the range of contexts the agents will encounter? What feedback loop will catch misalignment, and is it fast enough relative to deployment speed? What would I need to see to know this specification was wrong?

These are calibration questions. They don’t slow down good specification work. They prevent the specific failure mode that the specification economy, left to its own reward structure, will reliably produce: confident, well-formed, technically rigorous specifications that encode uncalibrated values at scale.

There’s a useful way to think about what pre-calibration looks like in practice. In high-stakes survival training, one of the foundational disciplines is pre-commitment: deciding in advance, before the situation generates pressure to decide otherwise, what you will and won’t do. The pre-commitment isn’t a constraint on judgment. It’s what protects judgment when the conditions designed to degrade it arrive. The person who hasn’t done that work doesn’t make better decisions under pressure. They make faster ones. Speed and quality are not the same thing, and in compressed environments they often trade against each other in ways that only become visible downstream.

Writing specifications for agent fleets is its own version of that discipline. The governance layer isn’t the specification itself. It’s the calibration work that happens before the specification gets written, the slow, unglamorous process of making assumptions explicit when there’s still time to examine them.

The Pattern Behind the Pattern

The specification economy is not a new kind of problem. It’s a new instance of a pattern that appears every time capability scales faster than the structures built to hold it accountable.

It happened when ocean navigation expanded trade routes and the leverage of movement outpaced the moral structures prepared to govern it. It happened when industrial machinery amplified production faster than labor protections could form. It happened when social platforms scaled engagement optimization across billions of interactions before anyone seriously asked what they were optimizing toward. In each case the capability was real, the economic logic was sound, and the people building it were not malicious. They were operating inside a reward structure that selected for speed, scale, and legibility, and that had no vocabulary for the obligations that scale creates.

The specification economy is following the same arc. The capability is real. The economic logic is sound. Jones is right that the bottleneck has shifted and that the people who master specification will capture significant value. What the economic framing cannot see is that capturing value and bearing responsibility are not the same activity, and that the gap between them is where harm accumulates quietly until it becomes visible all at once.

The question worth sitting with is not how to write better specifications. It’s whether the people now sitting at the governance layer of agentic systems have the calibration infrastructure that governance requires. Not the technical skills, those are coming, the market will produce them. The prior question: have they examined the values they’re encoding? Do they know what feedback will catch misalignment before it propagates? Have they decided, before the deadline pressure arrives, what they will and won’t build into the acceptance criteria?

The specification bottleneck is real. The calibration bottleneck is deeper. The full story of human-AI alignment will play out in that depth.