Twenty-Four Agents. Nine Rules. Zero LLM Routing.

Every message that flows through my 24-agent fleet is routed by nine rules. Not a model. Not an orchestrator LLM making judgment calls. Nine deterministic conditionals that map agent type to authority level to handler — and never deviate.

That was the first architectural decision I made building Principal, the A2A message routing layer for Invictus Labs. And it's the one that made everything else possible.

The assumption most people carry into multi-agent systems is that intelligence should scale with complexity. More agents means harder coordination, so you reach for an orchestrator model to handle it. That instinct is wrong. The control plane — the layer deciding who sees a message, what authority level it requires, and whether to escalate or halt — needs to be the dumbest part of the stack. Testable. Auditable. Deterministic. No hallucination in the chain of command.

Eight phases. 430 tests. One overnight session. Here's what I actually learned.

Why the Control Plane Stays Dumb

The spec says it directly: "Type-first routing. Every message is routed based on the type of agent sending it, not its name." Foresight, my trading signal engine, is a Revenue Product. Revenue Products always route to VP Trading. VP Trading routes to OpenClaw if the action exceeds its authority ceiling. The broker doesn't reason about any of this — it reads the agent type, applies the rule, and dispatches.

This matters for three distinct reasons.

First, testability. Nine rules produce nine test paths plus their edge conditions. I shipped 430 tests at 98% coverage overall, with 100% coverage on the safety module. That's not achievable with probabilistic routing — you'd need to enumerate every possible model response to know what you're actually testing.

Second, auditability. Every message carries an immutable envelope: source, timestamp, correlation ID, causal chain. When something fails in production, I reconstruct exactly what happened. "The model decided to route it differently this time" isn't a reconstruction. It's a gap in the audit log.

Third, predictability under pressure. The kill switch — four levels from task-pause to full halt — needs to work precisely when everything else is failing. That's when you least want routing decisions involving an LLM. The kill switch runs on the same deterministic logic as the rest of the broker. No reasoning required to make it fire.

◈INSIGHT

The pattern generalizes: use AI where inference and creativity add irreplaceable value. Use deterministic logic where correctness and auditability are non-negotiable. Conflating those two categories is how agent systems become unpredictable at scale.

Authority Ceilings Are Hard Infrastructure

The second decision that shaped everything: authority ceilings are enforced at the broker level, not the agent level.

Every agent in the fleet carries a maximum dollar exposure, a maximum risk level, and a maximum autonomous scope. Foresight executes trades below a defined threshold without human review. Above that threshold, the message surfaces in Mission Control as a pending approval — Knox reviews, approves, denies, or delegates. Foresight never decides whether it's within limits. The broker checks the agent card, computes the exposure, and enforces the ceiling before dispatch.

This is where most autonomy architectures fail. They give agents the ability to self-assess their own authority level, which means they can rationalize exceeding it. A broker that enforces externally removes that possibility entirely — not through better prompting, but through structure.

Routing Rules

Deterministic, type-first — no LLM orchestration

Test Coverage

98%

100% on the safety and kill-switch module

Kill Switch Levels

Task pause → full halt, all deterministic

The authority ceiling as hard infrastructure reframes the design question. Instead of "can this agent handle this?" you ask "what's the maximum scope this agent should own autonomously, and where does it hand off?" The answer lives in the agent card. The broker enforces it. I never have to watch for agents overstepping — the structure prevents it.

The Gap Between "Built" and "Working"

Here's the honest self-retro: I shipped nine PRs. The broker routes existed. The message pipeline existed. Auth middleware was written. And on first end-to-end verification, none of it was connected.

Routes were registered but never mounted to the FastAPI application. handle_message() was logging every incoming message without routing a single one. The auth middleware sat in its own file, never wired to any endpoint. A mission.json was manually copied into the container instead of volume-mounted, which meant it would drift from the source of truth the moment production ran a rebuild.

None of this surfaced as a test failure. Unit tests passed because unit tests verify components in isolation. They have no opinion on whether the components are wired together. The broker "worked" in every test. It didn't work as a system.

The fix took a few hours — mount the routes, wire the pipeline, add the middleware, replace the manual copy with a Docker volume mount. But the lesson is sharper than "write more integration tests." It's that operational readiness is a separate checklist from test coverage. A 98% coverage number tells you the parts behave correctly alone. It says nothing about whether the wiring is correct.

I now run E2E verified: cockpit shows broker_connected: true as a mandatory gate before any session closes. Not because unit tests fail to catch isolated bugs — they do. Because integration tests are the only thing that catch wiring failures.

The other mistake was building a standalone Next.js application for the broker UI before reading the upgrade spec that said all broker surfaces belong inside Mission Control. Caught it early, ported the wiring to MC's existing pages, deleted the separate app. Cost: a few hours and a rule I won't forget. Read the architecture document before writing code. Not the summary. The spec.

What a Chain of Command Actually Buys You

The north star metric for Principal is minutes Knox spends on operational decisions per day, trending toward zero. That metric forces clarity on what the infrastructure is for.

Before Principal, 24 agents across 37 repos meant 24 independent feedback loops. Trades with no attribution to OKRs. Content pipelines with no connection to mission KRs. No escalation path. No audit trail. No kill switch. The company was running, but not as a company — as a collection of scripts that happened to be deployed simultaneously.

Principal changes the structure, not the intelligence. The agents are exactly as capable as before. What changed is the chain of command: nine deterministic routing rules, authority ceilings enforced at the broker level, an immutable audit log on every message, and a kill switch with 100% test coverage. The five OKR tiles that read UNKNOWN in Mission Control now show real values. The approvals queue surfaces decisions that exceed agent authority. The behavioral tab shows drift before it compounds into a problem.

The most dangerous word in any engineering project is "done." Done means the code exists. It doesn't mean the code is wired. It doesn't mean the wiring survives real conditions. And it definitely doesn't mean the kill switch fires when everything else is failing.

Build the kill switch first. Wire everything before you call it shipped. Then verify end-to-end before the session closes.

Twenty-Four Agents. Nine Rules. Zero LLM Routing.

Why the Control Plane Stays Dumb

Authority Ceilings Are Hard Infrastructure

The Gap Between "Built" and "Working"

What a Chain of Command Actually Buys You

Follow the Signal

Claude Skills Have Three Layers. Most People Only Build One.

Your Claude Code Sessions Are Stateless. Your Engineering Discipline Shouldn't Be.

Judgment Debt: The Hidden Cost of Agentic AI