Volatility Is Not Crisis: A Regime Classifier Case Study
My V2 regime classifier called this a CRISIS. ADX was 43. The chart was a clean breakout. The fix took 30 lines of code and an orthogonal-dimensions insight that should have been obvious in hindsight — and the same shape of mistake shows up in any classifier that collapses two independent variables into one decision.
⊕ zoomMy V2 regime classifier called this a CRISIS.
ADX was 43. MACD was in a bullish cross. The chart was a textbook breakout — BTC up 3.8% over two hours, ETH and SOL going with it, volume expansion, the cleanest setup I'd seen all week. The classifier looked at it and returned regime=crisis. Every V2 long was blocked with SKIP=regime_crisis for the entire move.
The fix took 30 lines of code and an insight that should have been obvious in hindsight: volatility and direction are orthogonal dimensions, and any classifier that gates on volatility alone will eventually mistake the best trades of the year for distress.
This post is the case study. Same shape of mistake shows up in any classifier that collapses two independent variables into one decision — credit risk models that blend default rate with utilization, content moderators that blend toxicity with virality, competitive intelligence signals that blend novelty with confidence. The math is the same. The fix is the same.
What V2 saw
Here is what the V2 engine evaluated at 19:00:09 ET on April 7:
asset: BTC/15m
ATR(14): 0.0184 ← high
ATR percentile: 92 ← top decile of recent volatility
ADX(14): 43 ← strong trend
MACD: bullish cross
volume: 2.3x 20-day mean
score (V1): 85/100 STRONG LONG
And here is what the V2 regime classifier returned:
state = RegimeState.CRISIS
allow_trend_following = False
size_multiplier = 0.0
The V2 engine then refused every single long entry on every asset with the message SKIP=regime_crisis. Total V2 trades during the +3.8% breakout: zero.
Why the classifier was wrong
The original classifier was a four-state enum that looked only at the ATR percentile:
class RegimeState(Enum):
LOW_VOL_TRENDING = "low_vol_trending"
NORMAL = "normal"
HIGH_VOL = "high_vol"
CRISIS = "crisis"
def classify(atr_percentile):
if atr_percentile < 30:
return RegimeState.LOW_VOL_TRENDING
if atr_percentile < 70:
return RegimeState.NORMAL
if atr_percentile < 90:
return RegimeState.HIGH_VOL
return RegimeState.CRISIS
That classifier is internally consistent. It was even backtested. It "works" — for some definition of works that doesn't include the days you most want it to work.
The trap is in the unstated assumption: high volatility implies distress. That assumption is true for some kinds of volatility (flash crashes, capitulation events, news shocks) and catastrophically wrong for others (clean trending breakouts, post-consolidation moves, momentum continuation). The classifier had no way to tell the two apart because it was looking at one axis when the answer required two.
The 2D regime space
Volatility and directionality are independent variables. ATR measures how much the price is moving. ADX measures how organized that movement is — high ADX means the moves are aligned in one direction, low ADX means they cancel out.
That gives you a 2x2 quadrant:
- Low ATR, low ADX → NORMAL. Chop. Consolidation. Don't trade aggressively.
- Low ATR, high ADX → LOW_VOL_TRENDING. Slow drift. Clean rotation. Safe to trade with size.
- High ATR, low ADX → CRISIS. Big moves but not aligned. Flash crash. Capitulation. This is when you stand down.
- High ATR, high ADX → TREND_UP / TREND_DOWN. Big moves, all in one direction. The breakout. Trade it.
April 7 BTC sat firmly in the upper-right quadrant. ADX 43 ROC +0.13%. The original classifier collapsed the X axis and saw only the Y. From its perspective, April 7 looked identical to a flash crash — same ATR percentile, no other information considered.
When a classifier returns surprising results, the first question is not "is the threshold wrong?" It's "does it have all the dimensions it needs?" A surprising answer from a classifier is often a missing feature, not a mistuned weight.
The fix: PR #111
The fix is structurally simple and conceptually large. Add a directional check, expand the enum, wire ADX and ROC into the classifier inputs.
# regime_classifier.py
class RegimeState(Enum):
LOW_VOL_TRENDING = "low_vol_trending"
NORMAL = "normal"
TREND_UP = "trend_up" # NEW
TREND_DOWN = "trend_down" # NEW
CRISIS = "crisis"
CRISIS_ADX_THRESHOLD = 20
TREND_ADX_THRESHOLD = 25
TREND_ROC_THRESHOLD = 0.001
class RegimeClassifier:
def update(self, atr_percentile, adx=0.0, roc=0.0):
# high volatility branch
if atr_percentile >= 90:
if adx >= TREND_ADX_THRESHOLD and abs(roc) >= TREND_ROC_THRESHOLD:
return RegimeState.TREND_UP if roc > 0 else RegimeState.TREND_DOWN
if adx < CRISIS_ADX_THRESHOLD:
return RegimeState.CRISIS
return RegimeState.CRISIS # 20 <= ADX < 25 — conservative
if atr_percentile < 30:
return RegimeState.LOW_VOL_TRENDING
return RegimeState.NORMAL
Three thresholds, each with a defensible meaning:
ADX >= 25is the textbook "trend is established" threshold from Wilder's original 1978 paper. Below 25 you can't distinguish a trend from noise. This isn't a tuning value; it's a number from the math.|ROC| >= 0.001rules out micro-fluctuations. ROC here is(price - candle_open) / candle_openover the current 15m candle. The 0.1% floor means "the move is real, not just rounding."ADX < 20is the symmetric "definitely no trend" threshold. The 20-25 gap is the conservative middle: not enough trend to call it directional, treat it as crisis until proven otherwise.
The classifier now needs ADX and ROC as inputs. Both are already available in the technical analysis pipeline; I just hadn't been passing them in:
# trader.py:597 — wiring at the call site
ta = compute_ta_signal(asset, timeframe)
roc = (ta.price - ta.candle_open) / ta.candle_open if ta.candle_open else 0.0
regime_state = self.regime_classifier.update(
atr_percentile=ta.atr_percentile,
adx=ta.adx,
roc=roc,
)
The default=0.0 on adx and roc keeps the call backward-compatible — every existing test that didn't pass them still works, just with the same conservative-CRISIS behavior as before. New code paths get the directional check.
Before vs after
The before/after is a useful way to see what was missing.
| Before | After |
|---|---|
| LOW_VOL_TRENDING | LOW_VOL_TRENDING |
| NORMAL | NORMAL |
| HIGH_VOL | TREND_UP (new) |
| CRISIS (catch-all) | TREND_DOWN (new) |
| CRISIS (now: high ATR + low ADX only) |
Notice that CRISIS didn't go away. It got more specific. Before, it was a catch-all for "high volatility, no further questions." After, it's a precise diagnosis: high volatility with low directional strength. That's what an actual flash crash looks like — big candles in both directions, ADX collapsing because the moves cancel out.
The classifier is now strictly more useful and strictly less likely to misfire.
Replay validation
I wouldn't merge this without proving it on the actual April 7 data. The replay script runs three assertions:
def test_april7_btc_classified_as_trend_up():
state = classifier.update(atr_percentile=92, adx=43, roc=0.0013)
assert state == RegimeState.TREND_UP
cfg = REGIME_CONFIG[state]
assert cfg.allow_trend_following is True
assert cfg.size_multiplier == 1.0
def test_april7_sol_classified_as_trend_up():
state = classifier.update(atr_percentile=88, adx=50, roc=0.0026)
assert state == RegimeState.TREND_UP
def test_genuine_crisis_still_classified_as_crisis():
# March 2020 style flash crash — high ATR but ADX collapsed
state = classifier.update(atr_percentile=99, adx=15, roc=-0.04)
assert state == RegimeState.CRISIS
All three pass. The third is the one I cared about most — I needed to be sure the new logic wasn't just "always trade everything." Genuine crises (high vol, ADX collapsed because the price is whipsawing in both directions) still get classified as crisis and the bot still stands down.
The full V2 skip-gate compatibility test was the fourth assertion: TREND_UP and TREND_DOWN must not trigger the existing regime_no_trend or regime_crisis skip checks anywhere in signal_engine_v2.py. They didn't. PR shipped.
The principle
Volatility and direction are orthogonal. Any classifier that gates on one when the answer requires both will mistake the moments you most want to act for the moments you most want to hide.
The trap is collapsing dimensions. Once you've decided that "high X = bad," every system downstream of that classifier inherits the assumption — and the assumption is invisible because there's no if direction != aligned branch missing on the screen. There's nothing missing on the screen. The classifier looks complete because it returns a value for every input.
The way to catch this is to ask, every time you build a classifier:
- What dimensions am I using?
- Of the dimensions I'm not using, are any of them independent of the ones I am?
- For each independent dimension I'm ignoring, can I construct an input where the right answer flips when that dimension flips?
If the answer to question 3 is yes, you have a missing feature. Not a tuning issue. A missing feature.
For the V1 classifier the answer was an obvious yes the moment I asked. ATR 92, ADX 43 (right answer: trade) and ATR 92, ADX 15 (right answer: don't trade) both produce the same regime under the old logic because ADX wasn't an input. That's a smoking gun. I just hadn't asked the question.
Beyond trading
This shape of bug shows up everywhere:
- Credit risk: A model that scores risk by default rate alone, ignoring whether the underlying borrower is high-utilization-low-income or low-utilization-high-income. Same default rate, very different actual risk.
- Content moderation: A classifier that flags posts by toxicity score alone, not toxicity × intent. A clinical discussion of self-harm and a graphic glorification of self-harm have similar surface-level toxicity. They are very different problems.
- Anomaly detection: A monitor that alerts on traffic spikes alone, not spikes × geographic distribution. A viral product launch and a coordinated bot attack both look like spikes from one angle. They are very different events.
- InDecision Framework signals: A scoring function that ranks signals by raw strength alone, ignoring whether the underlying inputs are confirming each other or contradicting each other. Same strength, very different conviction. (Knox ships an entire engine that gets this right at indecision.io.)
The audit is identical in every case. Walk the inputs. Ask which dimensions you're collapsing. Ask which ones are independent. Ask whether any independent dimension would flip the right answer. Where the answer is yes, add the dimension before the next live event finds it for you.
Some classes of bug have a single shape and you can recognize them once and then catch them everywhere for the rest of your career. This is one of them.
The engineering patterns in this article are covered in the AI Infrastructure track — persistent platforms that run themselves. 11 lessons.
Start the AI Infrastructure track →Explore the Invictus Labs Ecosystem
Follow the Signal
If this was useful, follow along. Daily intelligence across AI, crypto, and strategy — before the mainstream catches on.

Claude Skills Have Three Layers. Most People Only Build One.
Prompt-engineering is already obsolete. The new unit of work is the skill — a folder with three layers, only one of which most people bother to build. The leverage lives in the layer they skip.

Your Claude Code Sessions Are Stateless. Your Engineering Discipline Shouldn't Be.
Every Claude Code session starts from zero — no memory of your standards, gates, or the three bugs that bit you last sprint. The Skills Library changes that. 19 slash commands. Institutional discipline, without the briefing.

Judgment Debt: The Hidden Cost of Agentic AI
AI coding agents don't just autocomplete — they plan, delegate, and decide. Most engineers haven't noticed the threshold they already crossed.