Bet Hero's docs page is an EV explainer for beginners. OddsJam doesn't publish methodology at all. This page is the actual math we run, the data we capture, and the cap rules we apply — so when our scanner says "+2.8% executable EV" you can verify exactly what that means.
When the same outcome trades on multiple venues, no single quote is the true probability — each venue carries its own vig, its own flow bias, and its own structural fee. We combine them with weights tied to how vig-free each venue is by design.
Per-venue no-vig prob: Kalshi p_raw / 1.01, Polymarket p_raw / 1.01, SB uses the opposite-side decimal odds if known via the multiplicative method (else assumes 5% overround). The final fair probability is the weighted average of all per-venue no-vig probs. Confidence is 1 − std(noVigProbs) / 10. Divergence is fired when the absolute gap between PM and Kalshi no-vig probs exceeds 200bps.
For Polymarket and Kalshi, we walk the live order book and compute the fill price you'd actually pay at three stake tiers. This is the foundation of our "real fillable EV vs paper math" claim. OddsJam and Bet Hero quote the headline best price; we quote the fill at YOUR stake size.
For sportsbooks (which don't expose order books) we use a crowdsourced stake-limits database (159 bootstrap entries × 14 books, growing via user reports). A book is considered fillable at the quoted price up to its known limit; above the limit, the leg is flagged.
Every arbitrage row carries a verdict that combines slippage and stake limits across all legs. This is the line between an arb that you can actually trade vs an arb that exists only on paper because no order book has the depth.
The default leaderboard view hides PAPER rows because in a raw-edge sort they dominate (worst liquidity ≈ widest spreads). Toggle ?paper=1 to inspect them.
For every outcome quoted by 3+ platforms, we form a peer set of the OTHER platforms (excluding the venue we'd take) and compute the median no-vig fair probability across them. If the venue we'd take offers a payout above that peer-set fair value, the pick is +EV.
The cross-venue peer set is what makes this scanner sharper than OddsJam's or Bet Hero's: their peer median is computed across sportsbooks only. We add Polymarket and Kalshi to the peer set, which on average shaves 50-150bps off the fair-prob drift because PM/Kalshi carry less vig and less book-cartel bias than the SB consensus.
A market is low-hold when the cross-platform combination of best prices (one per outcome from whichever venue is cheapest) leaves almost no margin. Negative hold = active arbitrage.
Polymarket markets settle via UMA optimistic oracle proposals. Disputes are rare but exist (the $150M BTC-by-Dec-2025 case being the most public). Our tool estimates dispute probability and expected resolution latency by category and adjusts EV accordingly.
The model also factors capital-cost (locked while pending resolution) and flip-risk (overturned outcome on dispute). The settlement-adjusted EV can be markedly lower than the raw EV on high-risk categories.
Since 2026-04-29 we capture every quote from 2000 Polymarket tokens continuously into Cloudflare R2. Schema below — this is the actual parquet shape, not a marketing approximation.
v1/dt=YYYY-MM-DD/venue=polymarket/hour=HH/<uuid>.parquet Fields (19): ts_capture_utc timestamp[us, UTC] capture wall-clock venue string polymarket / kalshi / book key event_id string canonical OB event id market_id string venue-side market id outcome_name string canonical OB outcome name outcome_side string YES / NO (PM/Kalshi only) raw_implied_prob float64 0-100, vig included decimal_odds float64 1.01 - 1000 best_ask float64 top of ask book best_bid float64 top of bid book depth_1k float64 ask-walk to fill $1k notional depth_5k float64 ask-walk to fill $5k notional depth_25k float64 ask-walk to fill $25k notional liquidity_flag string normal / low (heuristic) spread_bps int32 (ask - bid) × 10000 / mid source_lag_ms int32 latency from venue clock to our capture ws_or_rest string polling source collector_version string schema version pin segment_id string UUID grouping a single connection window
Total throughput in steady state: ~89 rows/min × 2000 tokens. Storage is partitioned by date + venue + hour for cheap range queries against R2's S3-compatible API.
A few things we are NOT yet:
DATA_INSUFFICIENT until 30 contiguous days of archive land (target: mid-June 2026).Spot a methodology bug or want a tier we don't yet publish? Tell us.