Judgment Debt: The Hidden Cost of Agentic AI

Every engineer I know has a story about watching Claude Code refactor a service, spawn three sub-agents, restructure the data layer, and open a pull request — all while they grabbed coffee. The story ends with "it worked." What they're not tracking is what they silently handed away in that fifteen minutes.

The framing most engineering teams use for AI agents is wrong. They call it a "tool." A tool executes what you specify. A hammer does not choose where to strike. But when you tell Claude Code to "fix the performance issue in the payments service," you are not specifying an action — you are specifying an outcome. The agent reads the codebase, develops a hypothesis, picks an approach, and executes a sequence of decisions you never reviewed. That is not tool use. That is delegation.

The distinction matters because delegation has rules. Rules that exist for good reasons. Rules that agentic AI is erasing faster than most teams can track.

The OODA Loop You Didn't Know You Were Inside

John Boyd's OODA loop — Observe, Orient, Decide, Act — describes how any agent processes its environment and responds. Boyd's insight was that speed through the loop matters: the actor who cycles faster disorients their opponent and seizes tempo.

AI coding agents run OODA loops that are orders of magnitude faster than human engineers. The agent observes the codebase (files, tests, git history), orients against the task (what does "fix performance" mean here?), decides on an approach (query optimization, caching layer, data structure change), and acts (writes the code, runs tests, opens PRs) — all before you've finished reading the PR description.

The engineering management problem is not that the agent might be wrong. It's that by the time you review the outcome, the decision is already made, the code is written, and you are reviewing output rather than governing process. You are no longer inside the loop. You are downstream of it.

The ability to operate at a faster tempo or rhythm than an adversary enables one to fold the adversary back inside himself.
— John Boyd · Patterns of Conflict

When your agent folds your decision authority back inside itself, you don't lose in combat. You lose in code review — which is worse, because the damage accumulates invisibly.

The Principal-Agent Problem, Restated for 2026

Management science has studied the principal-agent problem for decades. When you delegate to a subordinate (the agent), a gap opens between the principal's goals and the agent's behavior. The agent has their own incentives, their own interpretation, their own blind spots. Managing that gap is the job of engineering leadership.

Traditional delegation mitigates this gap through check-ins, code review, architecture decisions, and escalation paths. These are not bureaucratic overhead — they are the mechanisms that keep judgment aligned between the engineer doing the work and the organization that owns the outcome.

AI agents don't have misaligned incentives. They have no incentives. They optimize for the objective function embedded in your prompt. That sounds like a feature. It's actually the problem.

⚠WARNING

When you delegate to a human engineer, they push back. They say "I'm not sure this approach is right" or "this will break the auth service." That friction is signal. Agentic AI does not generate that friction by default — it executes toward the stated objective and surfaces problems only when they manifest as errors or test failures, not as judgment calls.

The agent's compliance is not competence. It's the absence of the one thing that makes human delegation navigable: a collaborator who has skin in the game and will tell you when the plan is bad.

Judgment Debt Compounds

Technical debt is the accumulated cost of shortcuts taken under time pressure. Everyone knows what it looks like: a codebase that works but cannot be changed without touching twelve unrelated things.

Judgment debt is the accumulated cost of decisions made without adequate human review. It doesn't appear in the codebase immediately. It appears six months later when you try to understand why the payments service was restructured this way, and nobody remembers — because the decision was made by an agent in a twenty-minute session and reviewed at the output level, not the decision level.

This is not hypothetical. Engineering teams running agentic workflows at scale in 2026 are already accumulating judgment debt at rates that will not become visible until the next major architectural change, the next performance regression, the next security audit. The code is correct. The tests pass. But the reasoning is missing, and reasoning is what you need when the requirements change.

The compounding dynamic is what makes this serious. Each agentic session where judgment is outsourced makes the next session less legible. The codebase drifts from the mental model of the engineers who maintain it. Decision context — why this approach and not that one — lives nowhere. You can't blame the agent. It made a reasonable choice. You just have no record of what choices were available, what was rejected, and why.

How to Maintain Decision Authority Without Killing Velocity

The goal is not to stop using agentic AI. The productivity gains are real. A team of 12 that deploys agentic workflows effectively can produce at the capacity of a team twice that size. That advantage is not optional in a competitive engineering environment.

The goal is to maintain decision authority while the agent executes. That requires a different kind of oversight — not reviewing output, but governing the decision points before execution starts.

Constrain the objective, not the output. The prompt "fix the performance issue" gives the agent unconstrained latitude. The prompt "investigate query performance in the payments service — propose three approaches with tradeoffs, do not implement without approval" keeps the decision loop intact. More characters. More value.

Require decision artifacts, not just diffs. Before a PR is mergeable, require the agent to produce a decision log: what was the problem statement, what approaches were considered, what was chosen and why. This is not friction — this is the missing reasoning that judgment debt destroys.

Establish architectural red lines. Define the decisions that are never delegated to agents: schema changes, service boundaries, authentication paths, external API contracts. Enumerate them explicitly. Agents will respect them exactly as specified — which is why you must specify them.

Review at the decision level, not the code level. Code review after agentic execution is too late to govern the important decisions. Governance happens before execution, at the task specification level. The engineer's job shifts from writing code to writing precise objectives that preserve the decision structures that matter.

Judgment-Debt Risk Multiplier

3-5x

Estimated accumulation rate vs. human-written code without decision logging in agentic workflows

None of this slows delivery materially. It does require a different mental model: you are not a code reviewer. You are a decision architect setting the parameters within which the agent operates.

The teams that figure this out early will not just maintain code quality — they will maintain the institutional knowledge and decision context that lets them evolve their systems over time. The teams that don't will ship faster for twelve months and spend the following three years unable to explain why their architecture looks the way it does.

Agents execute at machine speed. Judgment still runs at human speed. The engineering leaders who understand that distinction will define what "senior engineer" means in the next decade.

Judgment Debt: The Hidden Cost of Agentic AI

The OODA Loop You Didn't Know You Were Inside

The Principal-Agent Problem, Restated for 2026

Judgment Debt Compounds

How to Maintain Decision Authority Without Killing Velocity

Follow the Signal

Claude Skills Have Three Layers. Most People Only Build One.

Your Claude Code Sessions Are Stateless. Your Engineering Discipline Shouldn't Be.

The State Preservation Problem: Why Hot Reload in Live Trading Is Harder Than It Looks