2026 - 03¶
Goal: First profitable live month. Exit Phase 1. | Phase: 1 (Validate)
Strategy: Net Positive Live P&L — 🔶¶
Focus on what's working. Cut what isn't. Validate before deploying.
- Iterate SMATrendFollowing params — renamed to TrendFollowing (dropped SMA prefix), iterated v3→v10
- Create Market making strategy — SingleVenueMarketMaking v1, v2 deployed
- Disable losing strategies — aggressive pruning, 30+ strategies disabled across versions
| Metric | Value |
|---|---|
| Active strategies P&L | +$2.87 |
| All-time cumulative | +$42.43 (but +$113 is incident luck) |
| Incident P&L (Mar 22-23) | +$113 — accidental, see incident report |
| Excluding incident | ~-$71 all-time |
| Best active (Sharpe) | MeanReversion_v10_BTC: +$1.24, 89% WR, 0.61 Sharpe |
What worked: MeanReversion flipped from worst (Feb) to best performer. Massive iteration velocity (v3→v10 in one month). Active v10 strategies are stable and modestly profitable.
What didn't: MarketMaking net negative (-$3.78). TrendFollowing regressed — profitable in paper but flat/negative
in live (possible market regime sensitivity). Paper MeanReversion losing while live is winning — divergence needs
investigation.
Detailed strategy metrics: 2026-03-strategy-metrics.md
ML: Improve ML Signal — 🔶¶
Direction classification on kline-only features has ~50% accuracy — not actionable.
- Volatility prediction: CatBoost best (R² > 0.17, direction accuracy > 0.63). Tested XGBoost, alt loss functions, log targets, cross-asset features. Near performance ceiling under current feature space
- New data sources: funding rate, futures metrics, open interest, long/short ratios, order book (Binance futures + OKX spot/futures). External features didn't improve volatility. Order book improved direction MCC 0.107 → 0.157 — early signal worth exploring
- Architecture comparison: CatBoost vs TimesNet on same data split — not completed
- Strategy integration: 4 VolScaledDirectional variants running in paper (mean_rev, ml_vol, mean_rev_ml_vol,
BTC). Not yet profitable (
-$4.25combined) — not deployed to live
Key finding: Direction prediction precision capped at ~48-50% (precision-recall analysis). Below ~70% threshold needed for profitability after trading costs. Return prediction (direction + bps) near-random, R² ≈ 0. Market regime classification ~50% accuracy with significant distribution drift. Current feature space may be structurally insufficient for profitable direction trading at 1h horizon.
AI Agent: Domain Integration — 🔶¶
Prototype deployed in February (Streamlit chat + role selector + OpenRouter LLM). Currently surface-level. March focus: deep domain integration so each role provides value beyond the dashboard.
- Developer: run backtests from chat, analyze results with metric comparison, suggest parameter variations
- Operator: monitor agent deployed — queries backend API, sends Slack alerts on anomalies. Role-specific deep integration (parameter adjustment, incident guidance, kill switch) not started
- Advisor: performance attribution, cross-strategy risk analysis, event explanation — not started
- Infra: domain context injection, write operation guardrails, multi-turn conversation memory — not started
Team bandwidth absorbed by strategy iteration and incident response.
Ops: Risk Hardening — ✅¶
- Audit fail-closed vs fail-open decisions in risk/verification code
- Ensure killed strategies wind down positions (not abandon)
- Add alerting for state drift
- Document risk management runbook — not done
8 remediations shipped from Mar 22-23 incident: separate WS fill channel, fill health monitoring, circuit breakers (rejection auto-pause, position-flip detection, freshness check), kill switch crash fix, duplicate close guard, stale-state guards at 3 layers.
Incident: OKX Fill Pipeline Failure (Mar 22-23)¶
Impact: P1 Critical — 20 hours, all 7 strategies, +$113 (luck).
What happened: OKX WS delivered order status but not fill records. Engine marked orders as handled, removed them
from polling fallback. Strategies traded on stale position data. Manual reconcile fixed data but not the bug — restart
immediately recurred. $3,170 in unintended positions accumulated. 228 rejected orders with no auto-pause. Kill switch
crashed in infinite retry loop.
Key lesson: The +$113 profit masks a systemic failure. A price move the other way = -$3,000+. Reconciliation
fixes state, not the mechanism that broke it. Before restarting after any incident, verify the pipeline is healthy —
not just that verification passes.
Remediations (8 shipped): separate WS fill channel, fill health monitoring (60s), duplicate close guard, position-flip detection, pre-execution freshness check, consecutive rejection auto-pause, kill switch crash fix, strategy version reset.
Team¶
Gaddafi (4 commits — frontend)¶
- Alert monitor agent: LLM-powered monitoring service with Slack integration, interval-based alerts (collab w/ MJ)
- Multi-chart venue support, URL-based tab navigation (
?tab=deep-linking) - Pre-commit tooling (linting, import sorting, Black formatting)
- Ground-truth API implementation + UI improvements
- Bug fixes: strategy detail view, codebase cleanup
- Learning: investigated OKX gateway incident — reinforced importance of defensive programming in live systems
Vicky (7 commits — ML)¶
- Volatility modeling: CatBoost regression for BTC, ETH, portfolio volatility (R² > 0.17, direction acc > 0.63). Tested XGBoost, alt loss functions, log targets. Cross-asset features hit ceiling
- Direction prediction upper bound: precision-recall analysis shows ~48-50% precision cap — below ~70% profitability threshold. Return prediction near-random (R² ≈ 0). Regime classification ~50% with distribution drift
- Feature expansion: funding rate, futures metrics, OI, long/short ratios, order book (Binance + OKX). Order book improved direction MCC 0.107 → 0.157
- Infrastructure: fixed R² instability (expanded eval to 2,278 samples), refactored tuning pipeline to config-driven multi-model/multi-target framework
MJ (~201 commits — engine 128, ML 19, frontend 54)¶
- Backtest: rebuilt as service — per-run isolation, SQLite, VWAP fills, ~69x speedup. Results still inaccurate vs live — not used for March iteration. Intentionally deferred to prioritize e2e live validation
- Strategy: v3→v10 iteration via live + paper (backtest not accurate enough). New: SingleVenueMarketMaking
- Risk: SETTLING status, OKX WS fills, alert system (Slack), verification hardening
- Engine: order reconciliation, strategy lifecycle, past strategy management, CI/deploy automation
- Frontend: strategy config UI, backtest UI, accounting improvements, monitor agent
Phase Assessment¶
Still in Phase 1 (Validate). Exit criteria not met.
All-time P&L is +$42 on paper, but +$113 came from the Mar 22-23 incident — accidental profit from unmanaged
positions during a favorable price move. Excluding incident luck, all-time is ~-$71. Active v10 strategies are
stable at +$2.87 but too early and too small to declare edge.
What's needed to exit: Sustain positive P&L from intentional strategy execution over a full month with no incident windfalls. Current active strategies (MeanReversion, StatsArb) are the best candidates.
Learning¶
Iteration velocity matters but is expensive. v3→v10 in one month via live + paper. MeanReversion went from worst (Feb) to best (Mar). But each iteration costs real money (live) or days (paper). Backtest accuracy would convert this to minutes + zero cost. Backtest infrastructure improved this month (per-run SQLite isolation, VWAP fills, ~69x speedup) but results don't match live behavior closely enough to use for iteration. Iteration was done entirely via live and paper — backtest accuracy is the prerequisite for cost-efficient strategy development.
Incident profit is not strategy profit. +$113 from unmanaged positions during a pipeline failure must be
separated from strategy P&L in attribution. Luck-dependent outcomes mask systemic risk. The Mar 22-23 incident showed
that reconciliation restores state but doesn't fix the bug — restarting without confirming pipeline health caused
immediate recurrence and $3,170 in unintended positions. Fix systems, not symptoms.
Defense-in-depth requires independence. The polling fallback was disabled by the WS handler it was supposed to back up. Each defense layer must trigger on its own conditions, not be preemptible by the path it backs up.
Paper vs live divergence is real. MeanReversion profitable live, losing paper. TrendFollowing opposite. Same parameters, different results. Don't trust either in isolation. MarketMaking remains hard — two versions net negative, high volume but thin margins eaten by fees and slippage.
Direction prediction has a ceiling. Precision capped at ~48-50% with current features at 1h horizon. Below ~70% needed for profitability after fees. Next avenue: order book microstructure at shorter horizons (5-15min).