Incident Post-Mortems as Code: The Format That Turns Failures Into Firmware | AI Academy

You fixed the bug. The agent acknowledged it. The session ended. Next session, the same bug appears. You fix it again. The agent acknowledges it again.

This cycle will repeat indefinitely. Not because the model is incapable of learning. Because you never gave it a file to read. The correction existed as a spoken exchange in a context window that no longer exists. It was a conversation, not a record. And conversations die when sessions end.

Institutional memory does not live in conversations. It lives in files.

⚔DOCTRINE

Every correction that is not written down will be paid for again. The only question is how many times.

Post-mortems are not retrospectives. They are firmware updates for your AI operating system.

Escalation Pyramid

The Problem: Corrections That Evaporate

Lesson 15 introduced the compound learning loop — the idea that your AI system should get smarter over time, not reset to baseline every session. That lesson covered the why. This lesson covers the how — the specific format, escalation policy, and tooling that make compound learning operational.

The failure mode is universal. A team runs a retro. Issues are discussed. Everyone nods. Nothing is written down in a place that agents actually read. The next sprint, the same issues appear. The retro was catharsis, not correction.

In AI-OS operations, the same pattern plays out at session boundaries. An agent makes a mistake. You correct it. The agent performs flawlessly for the rest of the session. Session ends. Next session starts. The agent reads CLAUDE.md, reads the project context — and the correction is nowhere in either file. It was encoded in a session that no longer exists.

This is not a memory problem. It is an infrastructure problem. The correction had no durable home.

If you haven't read hundreds of books, you are functionally illiterate. Your personal experiences alone aren't broad enough to sustain you.
— General James Mattis · Call Sign Chaos, 2019

The same applies to your AI system. If it has not read hundreds of lessons from previous sessions, it is functionally amnesiac. Its personal experience — the current session — is not broad enough to sustain it.

The Format: Three Fields, No Exceptions

Every entry in lessons.md follows one format. No variations. No creative reinterpretation.

## [Date] — [Category]
**Mistake:** What went wrong (specific, factual, reproducible)
**Root cause:** Why it happened (one level deeper than the symptom)
**Rule:** The behavioral change that prevents recurrence (testable, enforceable)

Each field does specific work.

Mistake is the incident report. Not "something broke." The specific thing. "The urljoin(base, path) call dropped the base path because path started with /, creating an absolute URL that bypassed the base entirely." That is a mistake entry. "URL was wrong" is not.

Root cause forces depth. The symptom is that the URL was wrong. The root cause is that urljoin has counterintuitive behavior when the second argument is absolute — and the developer assumed it would concatenate. Root causes live one layer below symptoms. If your root cause sounds like a restatement of the mistake, you have not gone deep enough.

Rule is the firmware patch. It must be specific enough that an agent reading it cold — with no context about the incident — can follow it without ambiguity. "Always use f'{base}/{path.lstrip("/")}' instead of urljoin() when constructing API paths" is a rule. "Be careful with URLs" is not a rule. It is a wish.

⚠WARNING

The most common anti-pattern is a lesson without a root cause. When you write "Mistake: deployed wrong version" and skip straight to "Rule: double-check versions" — you have treated a symptom. The root cause might be that your deploy script reads from an environment variable that was stale, or that two branches had conflicting version bumps.

Without the root cause, the rule is a band-aid. It will fail the next time the same mechanism produces a different symptom.

The Escalation Pyramid

Not every lesson stays in lessons.md. The system has three tiers, each with wider blast radius and stricter enforcement.

Escalation Pyramid — Three Tiers of Institutional Memory

Tier 1: lessons.md — every project root gets one. Every correction lands here first, within the same session it occurred. The agent reads this file at session start. This is the tightest feedback loop: mistake happens, lesson captured, next session reads it, mistake does not recur.

Tier 2: CLAUDE.md — the project constitution. When a lesson category appears in lessons.md twice — not the same mistake, but the same category — it escalates. The test: would a brand-new agent, reading only CLAUDE.md, avoid this mistake? If the answer is yes, the rule belongs here. CLAUDE.md is read by every agent that works in this repo, including agents that never open lessons.md.

Tier 3: MEMORY.md — cross-project persistent memory. When a pattern spans multiple projects — Python 3.9 compatibility, commit email conventions, Docker networking constraints — it transcends any single project. It lives in MEMORY.md, where it applies universally. This tier has the widest blast radius. Every session, every project, every agent reads it.

Capture Rate

100%

every correction gets an entry

Escalation Trigger

same category twice = escalate

Active Lessons

50+

across the operating system

Anti-Patterns That Kill the System

"I'll remember this." You will not. You will remember it for the rest of this session. Then the session ends. The correction vanishes. This is the most common anti-pattern because it feels efficient — why write it down when you already understand it? Because you are not the one who needs to understand it next time. The agent is.

Vague lessons. "Be more careful with paths." That is not a lesson. It is a sentiment. A lesson says which paths, what about them, and the specific rule that prevents the error. Vague lessons create a false sense of security. You think the system learned. It learned nothing.

Lessons without root causes. "Mistake: wrong branch. Rule: use the right branch." That lesson has zero information content. The root cause — why the wrong branch was used, what about the workflow created the ambiguity — is where the actual learning lives. Without it, the rule is "don't make mistakes," which is not actionable.

Never escalating. The same mistake appears in lessons.md five times across five sessions. It has never been promoted to CLAUDE.md. The lesson is not working at its current tier. It needs wider enforcement. A lesson that repeats without escalating is a lesson that has failed — and the system that does not escalate it has also failed.

◈INSIGHT

The quality of your lessons.md directly predicts the quality of your AI system six months from now.

Fifty vague entries produce an agent that has "seen a lot" but learned nothing. Fifty precise entries — each with a real root cause and an enforceable rule — produce an agent that anticipates failures before they happen.

The entries are the investment. The compound returns depend on the quality of each deposit.

The Compound Effect: Session 100 vs. Session 1

After 50+ lessons captured with real root causes and enforceable rules, something shifts. The agent stops making the classes of mistakes that dominated early sessions. It flags risks proactively. It chooses the correct path convention without being told. It avoids the Docker networking trap, the urljoin trap, the stale environment variable trap — because those traps are documented in files it reads before writing a single line of code.

This is not intelligence. It is institutional memory that survives session boundaries. Session 100 is fundamentally smarter than session 1 — not because the model improved, but because the context improved. The model is the same. The memory layer is different.

The arithmetic is straightforward. Each lesson reduces future error rate by some small delta. Those deltas compound. After enough deposits, the agent operates with a base of knowledge that no single session could produce. It has seen every failure pattern this project has encountered, categorized by root cause, with enforceable rules for prevention.

That is what separates an AI tool from an AI operating system. The tool resets every session. The operating system compounds.

◉SIGNAL

The escalation path is not bureaucracy. It is a priority queue.

Lessons.md is the intake. CLAUDE.md is the filter. MEMORY.md is the canon. Each tier has increasing enforcement power and wider reach. The system self-organizes: high-frequency mistakes rise to constitutional status. Rare mistakes stay in the project log. The important rules float to the top through repetition, not manual curation.

Lesson 34 Drill

Open your current project. Create lessons.md in the root directory if it does not exist. Write three entries from your last week of work — real mistakes, not hypotheticals.

For each entry:

Use the exact format: ## [Date] — [Category] with Mistake, Root cause, Rule
Verify the root cause is one level deeper than the symptom
Verify the rule is specific enough that a stranger could follow it without context
Check: has this category appeared before? If so, escalate to CLAUDE.md immediately

Then add this line to your project's CLAUDE.md or agent instructions: "At session start, read lessons.md before writing code."

That single instruction closes the loop. Without it, the file exists but nobody reads it. With it, every session starts with the accumulated wisdom of every session before it.

Bottom Line

The format is three fields. The escalation is three tiers. The discipline is one rule: every correction gets captured, or it gets repeated.

Post-mortems as code means the lessons live where the agents live — in the filesystem, in the project root, in the files that get read at session start. Not in Slack threads. Not in memory. Not in good intentions.

Write it down. Make the format precise. Escalate when patterns repeat. That is how an AI system builds institutional memory that no session boundary can erase.