Volatility Is Not Crisis: A Regime Classifier Case Study

Visual Summary

click to expand

My V2 regime classifier called this a CRISIS.

ADX was 43. MACD was in a bullish cross. The chart was a textbook breakout — BTC up 3.8% over two hours, ETH and SOL going with it, volume expansion, the cleanest setup I'd seen all week. The classifier looked at it and returned regime=crisis. Every V2 long was blocked with SKIP=regime_crisis for the entire move.

The fix took 30 lines of code and an insight that should have been obvious in hindsight: volatility and direction are orthogonal dimensions, and any classifier that gates on volatility alone will eventually mistake the best trades of the year for distress.

This post is the case study. Same shape of mistake shows up in any classifier that collapses two independent variables into one decision — credit risk models that blend default rate with utilization, content moderators that blend toxicity with virality, competitive intelligence signals that blend novelty with confidence. The math is the same. The fix is the same.

What V2 saw

BTC 15m chart April 7 with the V2 classifier output overlaid — regime=CRISIS during the breakout window

⊕ zoom

Here is what the V2 engine evaluated at 19:00:09 ET on April 7:

asset:        BTC/15m
ATR(14):       0.0184          ← high
ATR percentile: 92              ← top decile of recent volatility
ADX(14):       43               ← strong trend
MACD:          bullish cross
volume:        2.3x 20-day mean
score (V1):    85/100 STRONG LONG

And here is what the V2 regime classifier returned:

state = RegimeState.CRISIS
allow_trend_following = False
size_multiplier = 0.0

The V2 engine then refused every single long entry on every asset with the message SKIP=regime_crisis. Total V2 trades during the +3.8% breakout: zero.

Why the classifier was wrong

The original classifier was a four-state enum that looked only at the ATR percentile:

class RegimeState(Enum):
    LOW_VOL_TRENDING = "low_vol_trending"
    NORMAL = "normal"
    HIGH_VOL = "high_vol"
    CRISIS = "crisis"

def classify(atr_percentile):
    if atr_percentile < 30:
        return RegimeState.LOW_VOL_TRENDING
    if atr_percentile < 70:
        return RegimeState.NORMAL
    if atr_percentile < 90:
        return RegimeState.HIGH_VOL
    return RegimeState.CRISIS

That classifier is internally consistent. It was even backtested. It "works" — for some definition of works that doesn't include the days you most want it to work.

The trap is in the unstated assumption: high volatility implies distress. That assumption is true for some kinds of volatility (flash crashes, capitulation events, news shocks) and catastrophically wrong for others (clean trending breakouts, post-consolidation moves, momentum continuation). The classifier had no way to tell the two apart because it was looking at one axis when the answer required two.

The 2D regime space

2x2 quadrant of ATR by ADX showing CRISIS, NORMAL, TREND_UP, and LOW_VOL_TRENDING regions

⊕ zoom

Volatility and directionality are independent variables. ATR measures how much the price is moving. ADX measures how organized that movement is — high ADX means the moves are aligned in one direction, low ADX means they cancel out.

That gives you a 2x2 quadrant:

Low ATR, low ADX → NORMAL. Chop. Consolidation. Don't trade aggressively.
Low ATR, high ADX → LOW_VOL_TRENDING. Slow drift. Clean rotation. Safe to trade with size.
High ATR, low ADX → CRISIS. Big moves but not aligned. Flash crash. Capitulation. This is when you stand down.
High ATR, high ADX → TREND_UP / TREND_DOWN. Big moves, all in one direction. The breakout. Trade it.

April 7 BTC sat firmly in the upper-right quadrant. ADX 43 ROC +0.13%. The original classifier collapsed the X axis and saw only the Y. From its perspective, April 7 looked identical to a flash crash — same ATR percentile, no other information considered.

◈INSIGHT

When a classifier returns surprising results, the first question is not "is the threshold wrong?" It's "does it have all the dimensions it needs?" A surprising answer from a classifier is often a missing feature, not a mistuned weight.

The fix: PR #111

The fix is structurally simple and conceptually large. Add a directional check, expand the enum, wire ADX and ROC into the classifier inputs.

# regime_classifier.py
class RegimeState(Enum):
    LOW_VOL_TRENDING = "low_vol_trending"
    NORMAL = "normal"
    TREND_UP = "trend_up"          # NEW
    TREND_DOWN = "trend_down"      # NEW
    CRISIS = "crisis"

CRISIS_ADX_THRESHOLD = 20
TREND_ADX_THRESHOLD = 25
TREND_ROC_THRESHOLD = 0.001

class RegimeClassifier:
    def update(self, atr_percentile, adx=0.0, roc=0.0):
        # high volatility branch
        if atr_percentile >= 90:
            if adx >= TREND_ADX_THRESHOLD and abs(roc) >= TREND_ROC_THRESHOLD:
                return RegimeState.TREND_UP if roc > 0 else RegimeState.TREND_DOWN
            if adx < CRISIS_ADX_THRESHOLD:
                return RegimeState.CRISIS
            return RegimeState.CRISIS  # 20 <= ADX < 25 — conservative

        if atr_percentile < 30:
            return RegimeState.LOW_VOL_TRENDING

        return RegimeState.NORMAL

Three thresholds, each with a defensible meaning:

ADX >= 25 is the textbook "trend is established" threshold from Wilder's original 1978 paper. Below 25 you can't distinguish a trend from noise. This isn't a tuning value; it's a number from the math.
|ROC| >= 0.001 rules out micro-fluctuations. ROC here is (price - candle_open) / candle_open over the current 15m candle. The 0.1% floor means "the move is real, not just rounding."
ADX < 20 is the symmetric "definitely no trend" threshold. The 20-25 gap is the conservative middle: not enough trend to call it directional, treat it as crisis until proven otherwise.

The classifier now needs ADX and ROC as inputs. Both are already available in the technical analysis pipeline; I just hadn't been passing them in:

# trader.py:597 — wiring at the call site
ta = compute_ta_signal(asset, timeframe)
roc = (ta.price - ta.candle_open) / ta.candle_open if ta.candle_open else 0.0

regime_state = self.regime_classifier.update(
    atr_percentile=ta.atr_percentile,
    adx=ta.adx,
    roc=roc,
)

The default=0.0 on adx and roc keeps the call backward-compatible — every existing test that didn't pass them still works, just with the same conservative-CRISIS behavior as before. New code paths get the directional check.

Before vs after

Side-by-side enum comparison — 4 states before, 6 states after

⊕ zoom

The before/after is a useful way to see what was missing.

Before	After
LOW_VOL_TRENDING	LOW_VOL_TRENDING
NORMAL	NORMAL
HIGH_VOL	TREND_UP (new)
CRISIS (catch-all)	TREND_DOWN (new)
	CRISIS (now: high ATR + low ADX only)

Notice that CRISIS didn't go away. It got more specific. Before, it was a catch-all for "high volatility, no further questions." After, it's a precise diagnosis: high volatility with low directional strength. That's what an actual flash crash looks like — big candles in both directions, ADX collapsing because the moves cancel out.

The classifier is now strictly more useful and strictly less likely to misfire.

Replay validation

I wouldn't merge this without proving it on the actual April 7 data. The replay script runs three assertions:

def test_april7_btc_classified_as_trend_up():
    state = classifier.update(atr_percentile=92, adx=43, roc=0.0013)
    assert state == RegimeState.TREND_UP
    cfg = REGIME_CONFIG[state]
    assert cfg.allow_trend_following is True
    assert cfg.size_multiplier == 1.0

def test_april7_sol_classified_as_trend_up():
    state = classifier.update(atr_percentile=88, adx=50, roc=0.0026)
    assert state == RegimeState.TREND_UP

def test_genuine_crisis_still_classified_as_crisis():
    # March 2020 style flash crash — high ATR but ADX collapsed
    state = classifier.update(atr_percentile=99, adx=15, roc=-0.04)
    assert state == RegimeState.CRISIS

All three pass. The third is the one I cared about most — I needed to be sure the new logic wasn't just "always trade everything." Genuine crises (high vol, ADX collapsed because the price is whipsawing in both directions) still get classified as crisis and the bot still stands down.

The full V2 skip-gate compatibility test was the fourth assertion: TREND_UP and TREND_DOWN must not trigger the existing regime_no_trend or regime_crisis skip checks anywhere in signal_engine_v2.py. They didn't. PR shipped.

The principle

Volatility and direction are orthogonal. Any classifier that gates on one when the answer requires both will mistake the moments you most want to act for the moments you most want to hide.

The trap is collapsing dimensions. Once you've decided that "high X = bad," every system downstream of that classifier inherits the assumption — and the assumption is invisible because there's no if direction != aligned branch missing on the screen. There's nothing missing on the screen. The classifier looks complete because it returns a value for every input.

The way to catch this is to ask, every time you build a classifier:

What dimensions am I using?
Of the dimensions I'm not using, are any of them independent of the ones I am?
For each independent dimension I'm ignoring, can I construct an input where the right answer flips when that dimension flips?

If the answer to question 3 is yes, you have a missing feature. Not a tuning issue. A missing feature.

For the V1 classifier the answer was an obvious yes the moment I asked. ATR 92, ADX 43 (right answer: trade) and ATR 92, ADX 15 (right answer: don't trade) both produce the same regime under the old logic because ADX wasn't an input. That's a smoking gun. I just hadn't asked the question.

Beyond trading

This shape of bug shows up everywhere:

Credit risk: A model that scores risk by default rate alone, ignoring whether the underlying borrower is high-utilization-low-income or low-utilization-high-income. Same default rate, very different actual risk.
Content moderation: A classifier that flags posts by toxicity score alone, not toxicity × intent. A clinical discussion of self-harm and a graphic glorification of self-harm have similar surface-level toxicity. They are very different problems.
Anomaly detection: A monitor that alerts on traffic spikes alone, not spikes × geographic distribution. A viral product launch and a coordinated bot attack both look like spikes from one angle. They are very different events.
InDecision Framework signals: A scoring function that ranks signals by raw strength alone, ignoring whether the underlying inputs are confirming each other or contradicting each other. Same strength, very different conviction. (Knox ships an entire engine that gets this right at indecision.io.)

The audit is identical in every case. Walk the inputs. Ask which dimensions you're collapsing. Ask which ones are independent. Ask whether any independent dimension would flip the right answer. Where the answer is yes, add the dimension before the next live event finds it for you.

Some classes of bug have a single shape and you can recognize them once and then catch them everywhere for the rest of your career. This is one of them.

What V2 saw

Why the classifier was wrong

The 2D regime space

The fix: PR #111

Before vs after

Replay validation

The principle

Beyond trading

Follow the Signal

5 AI Agent Design Patterns That Survive Production

AI Agent Observability: Monitoring 325 Agents Without Watching Them

The AI Agent Tech Stack Behind 325 Agents in Production