Claude Skills Have Three Layers. Most People Only Build One.

The most popular advice on the internet about Claude is already out of date.

Every tutorial, course, and viral thread is still teaching prompt-engineering — how to phrase a request so the model produces a better answer. That was a useful skill in 2023. It is not the discipline that matters now. The teams getting real leverage from Claude have stopped writing prompts the way most users still do, and have started writing something different.

They write skills.

A skill is not a clever prompt. It is a structured artifact — a folder on disk with three distinct layers, each doing a different job. The shift from prompting to skill-engineering changes what you can build, what compounds, and what a session is even for. And here is the part that nobody is saying out loud: of the three layers that make a skill powerful, almost everyone only ships the first two. The third layer — the one with the actual leverage — gets skipped.

That is why most people's AI workflows plateau.

What a Skill Actually Is

Stop thinking of a skill as a saved prompt. Think of it as a small application.

Every skill contains three layers, and each layer has a specific job that the others cannot do.

Layer 1 — Description. The frontmatter. The label on the folder. This is what the model reads on every single user message to decide whether to invoke this skill at all. Vague descriptions create vague routing. Specific descriptions, with explicit triggers, mean the right skill fires automatically without the user typing a slash command. The description is not metadata. It is the routing logic.

Layer 2 — Instructions. The playbook. Once the skill is invoked, this is the procedure the model follows. This is what most people do ship, because it looks like a prompt — and writing prompts is the only mental model most engineers have.

Layer 3 — Tools. Scripts, API calls, reference files. Deterministic code that the skill can execute. This is the layer that turns a procedural prompt into a real piece of software. It is also the layer almost no one builds.

⚔DOCTRINE

A prompt is something you type. A skill is something you ship. The difference is whether the third layer exists.

The Layer Almost Everyone Skips

Watch any walkthrough of someone "building a skill" and count how many of them stop after Layer 2. The instructions get written, the markdown looks tidy, the slash command works once, and the author moves on.

The work that should happen next — extracting the deterministic parts into actual scripts — almost never gets done. The result is a skill that asks the model to regenerate the same Python or bash logic on every invocation, paying tokens and accepting non-determinism for work that should have been compiled once.

This is the part that an Anthropic engineer named Eric flagged in a public talk that stuck with me. He observed that people pour enormous effort into "beautiful, detailed prompts" and then ship the tools layer in an absolute shambles — no documentation, parameters named A and B, function signatures no engineer would tolerate in production code. The most visible part of the skill is polished. The part that determines whether the skill scales is treated like a draft.

That mismatch is the whole story. Code is deterministic. Same input, same output, every run. LLM regeneration is not. Same input can produce variant outputs, variant errors, variant token costs. Anytime you catch the model regenerating the same logic inside a skill on repeat invocations, that logic is a candidate for the Tools layer. Move it to a script. Have the skill call the script. You have just traded probabilistic compute for deterministic compute — cheaper, faster, repeatable, testable.

◉SIGNAL

The rule of thumb: if code can do it, code should do it. The model is for judgment. The script is for everything else.

Composability Beats Monoliths

The second mistake almost as common as skipping Layer 3 is building skills that try to do too much.

The temptation is obvious. You want one big skill that handles your whole workflow — research, drafting, formatting, publishing. One slash command, one place to make changes, one mental model. So you write it that way. And for a few weeks it works.

Then it stops working. You want to change how the formatting step behaves and you cannot find the section. You want to reuse the research step inside a different workflow and you cannot, because it is welded to the rest. A small change in one corner of the skill breaks a different corner you forgot about. The "one big skill" pattern collapses under its own surface area.

The right pattern is the inverse. Small, focused, composable skills that each do one thing and chain together. Three reasons this wins:

Failures localize. When a focused skill breaks, you know exactly where to look. When a monolithic skill breaks, you guess.
Improvements compound. Upgrade one composable skill and every workflow that uses it gets better automatically. Upgrade one monolithic skill and you have to remember every place its behavior is duplicated.
Reuse beats rebuild. A focused skill becomes a building block. A monolithic skill becomes a special case.

This is the same lesson software engineers learned about functions in the 1970s and microservices in the 2010s. It applies just as cleanly to skills now.

Two Flags Almost Nobody Uses

There are two invocation controls baked into Claude's skill format that I almost never see used in the wild, and both are important enough that ignoring them is a real mistake.

The first is making a skill invisible to the user. You can set a skill so it never appears in the slash menu — only sub-agents can invoke it. This is the right configuration for internal plumbing. Helper skills that exist to be called by other skills do not belong in your slash menu, where they create noise and surface area for the user to misuse them.

The second is the inverse — making a skill invisible to the model. You can set a skill so the model cannot invoke it, only the human can. This is the right configuration for anything destructive or expensive. Deploys to production. Sending a message to an external channel. Placing a real-money order. The model has no business firing those autonomously. The flag is not a nice-to-have. It is structural safety that does not rely on your own discipline holding.

⚔DOCTRINE

Discipline that depends on remembering is not discipline. It is luck on a timer.

Most skill authors never touch either flag, so every skill in their library defaults to both human-invocable and model-invocable. That is fine for read-only diagnostic skills. It is dangerous for anything that writes to the world.

The Compounding Loop

Here is where skill-engineering pulls ahead of prompt-engineering in a way that can never be reversed.

A prompt is ephemeral. When the session ends, the prompt ends with it. The next session starts from zero — the same blank context, the same instructions to re-type, the same edge cases to re-explain. You can be writing the world's best prompts and still not be accumulating anything.

A skill persists. The skill file lives on disk, gets version-controlled, and travels with the project. And every time you use the skill, you get a chance to sharpen it. After every run, the question to ask is simple:

The compounding question

One-time fix?

or should this be in the skill forever?

If the fix is forever, you write it into the skill — a new rule, a new edge case, a new example. The next session starts smarter than the last. The skill becomes a record of every lesson the prior runs taught you. This is what Anthropic's team means when they say their goal is for Claude on day thirty of working with you to be measurably better than Claude on day one. That outcome does not happen because the model improved. It happens because the skill improved.

Most people skip this step entirely. They run a skill, get an output, fix the output by hand, and move on. The fix never makes it back into the skill. The same correction gets repeated next week. The skill never compounds.

The fix is mechanical. After any non-trivial skill invocation where the output needed adjustment, ask the model directly: review the back-and-forth we just had after running this skill — can we update the skill so this is handled automatically and we don't make the same mistake twice? The model is good at this. It already has the full context. It will propose a precise edit. You review, accept, commit. The skill is now permanently smarter.

That is the loop. It is small. It is boring. It is what separates an AI workflow that compounds from one that just runs.

The Real Shift

Prompt-engineering treated the prompt as the unit of work. That framing is now wrong.

The unit of work is the skill — a folder, three layers, version-controlled, composable with other skills, sharpened after every use. The model is one layer of that skill, not the whole stack. Treating the model as the whole stack is the cognitive error behind most of the "AI isn't living up to the hype" complaints I still hear from operators. Of course the model alone disappoints you. The model alone is one of three layers, and not even the most important one.

The leverage is in the skill — and inside the skill, the leverage is in the layer most people skip.

◉SIGNAL

Build the third layer. Ship composable skills. Use both invocation flags. Run the compounding loop. None of these are technical moats — they are practice moats, and almost no one is practicing.

That is the work now. Not prompt-engineering. Skill-engineering. Different unit, different artifact, different ceiling.

What a Skill Actually Is

The Layer Almost Everyone Skips

Composability Beats Monoliths

Two Flags Almost Nobody Uses

The Compounding Loop

The Real Shift

Follow the Signal

5 AI Agent Design Patterns That Survive Production

AI Agent Observability: Monitoring 325 Agents Without Watching Them

The AI Agent Tech Stack Behind 325 Agents in Production