I Built a Sports Prediction Bot That Bets Against the Market — Here's the Architecture
Sports prediction is a solved problem for the books. It's wide open on Polymarket. Here's how I built Shiva — a 6-factor probability engine that finds edge in NBA and MLB markets using free public APIs and adaptive weights.

The sportsbooks have solved the prediction problem. Their models have forty years of data, proprietary injury feeds, real-time line movement, and entire teams of quantitative analysts. Beating them is a different problem category than the one most people think they're solving.
Polymarket is different. It's a prediction market — decentralized, slow-moving, and priced by retail participants who follow news cycles instead of probability curves. The lines move on sentiment, not on actuarial math. That gap is the edge.
I built Shiva — a sports prediction engine that ingests team statistics, injury reports, rest data, and pre-game media sentiment, then compares its estimated probability against Polymarket's market price. When the gap is large enough, it places a bet. Every decision is logged. Every outcome is tracked. The model learns.
The Architecture Problem Nobody Talks About
Most sports prediction projects fail at the data layer. They either depend on expensive paid APIs (ESPN Synergy, StatHead subscriptions, Basketball Reference bulk exports) or they scrape brittle HTML that breaks with every site redesign.
Shiva uses nothing that costs money. The ESPN Stats API is unofficial but stable — it's what ESPN's own apps use, which makes it unlikely to disappear. The MLB Stats API is official and fully documented. Perplexity's Sonar API handles pre-game media context for lineup confirmations and injury updates that don't appear in the structured data feeds until it's too late.
The Polymarket GAMMA API provides the market universe: 55+ live NBA game events, each with 30-49 sub-markets covering moneylines, point spreads, totals, and individual player props. Every scan cycle takes 30 seconds. The bot evaluates every active market against a price filter (>10%, <90%) that strips out near-resolved positions and focuses on genuinely uncertain outcomes.
Four layers. None of them coupled tightly enough to cascade-fail. Data feeds fail gracefully — if the NBA injury report times out, the model runs without it. The execution layer is permanently sandboxed in paper trade mode until explicitly disabled. The logging layer records everything independently of whether a bet was placed.
The Six-Factor Probability Model
Every market type gets a different estimation function, but they all draw from the same six factors.
Home court advantage — NBA home teams win approximately 57% of games historically. That's the prior. Every estimate starts with home court at +5% before any other data is applied.
Rest — Back-to-back games carry a -8% penalty. A well-rested team facing a back-to-back opponent gets a +6% bonus. These numbers come from the literature on NBA performance degradation under compressed schedules.
Recent form — Last five games, weighted [0.30, 0.25, 0.20, 0.15, 0.10], most recent first. This captures momentum without overweighting single-game variance. A team that went 4-1 over the last five has a different trajectory than one that went 1-4.
Injuries — Structured by severity: Out (-15%), Doubtful (-10%), Questionable (-7%), Day-to-Day (-5%), Probable (-2%). Total impact is capped at -30% to prevent a cascading injury list from pushing a team to near-zero probability. The cap is real — game situations exist where four players are questionable simultaneously.
Head-to-head — Reserved for future integration. The data is available; the implementation is pending enough resolved history to validate the weight before committing.
Media sentiment — Perplexity scans pre-game coverage for each matchup. The output is a sentiment score from -1.0 to 1.0, applied as a small modifier. It's the only soft factor in the model, deliberately weighted low until calibration data justifies increasing it.
The model's prior is deliberately conservative. Edge in prediction markets comes from knowing when the market is wrong, not from having a better model than the books. A 4% edge at even money has positive expected value. That's the bar.
For spread markets, the model translates expected point differential into a cover probability using a logistic approximation of the normal CDF with a 12-point standard deviation — the empirical NBA game variance. For totals, the same method applies against the over/under line with a 15-point SD. Player props use stat-type-specific standard deviations (8 for points, 3.5 for rebounds, 2.5 for assists) against a 60/40 blend of season average and recent five-game average.
Paper Trading as a Data Collection Strategy
The model ships with DRY_RUN=true. That's not a safety hedge — it's a data strategy.
The first job of the paper trading loop is not to make money. It's to generate a labeled dataset: thousands of signals with known outcomes, factor values recorded at the time of prediction, and final market resolution stored in SQLite. That dataset is the raw material for calibration.
The training loop runs daily at 5 AM, after NBA game results have propagated through Polymarket's resolution pipeline. It reads resolved trades, computes Brier score (the mean squared difference between predicted probability and actual outcome), log loss, and accuracy by market type. Then it updates factor weights.
The weight system mirrors what I built for Foresight, the crypto prediction bot. Every factor carries a multiplier, initialized at 1.0 and bounded between 0.5 and 1.5. A factor that correlates with winning signals gets a 10% bump. A factor that correlates with losses gets a 10% reduction. The minimum sample size before any adjustment is 20 resolved trades — not enough data to calibrate is explicitly represented as a system state, not a silent assumption.
The adaptive weights don't make the model smarter than its inputs. They make it honest about which inputs are actually informative for sports outcomes on Polymarket, as opposed to which inputs are merely available.
The entire training state surfaces in Mission Control — the dashboard that monitors all active bots in the ecosystem. The Training tab shows calibration curves, factor weight drift over time, accuracy by market type, and training run history. If home court advantage is systematically miscalibrated on Polymarket's NBA lines, that'll appear as a weight pushing toward 1.5× before any human analysis confirms it.
What This Is Actually Measuring
Sports prediction on Polymarket is not the same problem as sports prediction against a sportsbook line.
A sportsbook line is set by professional oddsmakers, sharpened by professional bettors, and adjusted in real time based on money flow from sophisticated markets. Beating it sustainably requires genuine information advantage — proprietary data, faster reaction to injury news, model structures the public doesn't have.
A Polymarket sports line is set by a much thinner market. The volumes on individual NBA game markets are substantial ($500K to $3M daily on high-profile matchups), but the price discovery process is slower and less efficient. Lines move on news cycles and Twitter sentiment more than on model recalibration. The vig is lower than a sportsbook — prediction markets take a smaller cut.
That's the structural edge. Not a better model than the quants. A better model than retail participants pricing NBA lines on a prediction market.
The paper trading loop generates the ground truth on whether that structural edge is real. Six months of logged signals with known outcomes will show whether the 6-factor model has genuine calibration or whether it's running on plausible priors that don't survive contact with actual market data. Both outcomes are useful. One tells you where to deploy capital. The other tells you which factors to rebuild.
Either way, the data exists. And the model learns.
The full architecture and trading logs surface in Mission Control, the command dashboard I built for the Invictus Labs ecosystem.
Explore the Invictus Labs Ecosystem
Follow the Signal
If this was useful, follow along. Daily intelligence across AI, crypto, and strategy — before the mainstream catches on.

Foresight v5.0: How I Rebuilt a Prediction Market Bot Around Candle Boundaries
The bot was right. The timing was wrong. v4.x had a fundamental reactive architecture problem — by the time signals scored, the CLOB asks were too expensive. v5.0 solved it with event-driven candle boundaries and predictive early-window scoring.

Hermes: A Political Oracle That Bets on Polymarket Using AI News Intelligence
Political prediction markets don't move on charts — they move on information. Hermes is a Python bot that scores political markets using Grok sentiment, Perplexity probability estimation, and calibration consensus from Metaculus and Manifold. Here's how it works.

Leverage: Porting the Foresight Signal Stack to Crypto Perpetuals
The signal stack I built for prediction markets turns out to work on perpetual futures — with modifications. Here's how a 9-factor scoring engine, conviction-scaled leverage, and six independent risk gates become a perps trading system.