2026 - 02¶
Goal: Validate paper trading results in live and achieve first profitable month. | Phase: 1 (Validate)
Strategy: Live Calibration Experiment — ✅¶
Launch ALL strategies to live as time-bounded experiment to verify paper vs live consistency.
- Deploy all strategies to live with identical parameters as paper
- Run paper and live simultaneously for comparison
- Compare metrics per strategy: P&L, fill rate, slippage, latency
- Document paper vs live discrepancies for each strategy type
Constraints: 2-week calibration phase, small scale per strategy, halt strategy if live loss exceeds threshold.
Paper vs live consistency validated — varies significantly by strategy type. Fidelity is strategy-dependent: trend following was better in live than paper; cross-venue arb was wildly optimistic in paper. Don't trust paper results uniformly.
Strategy: First Profitable Live Month — ❌¶
Achieve positive P&L in live trading for February.
- At least one strategy profitable in live
- Net positive P&L across all live strategies
- Understand why profitable (not luck)
Net P&L: ~-$47 across all versions. Bulk from Feb 15 incident (-$39).
What worked: SMATrendFollowing +$3.31 (ETH + BTC). Live execution confirmed feasible. Platform operationally
proven.
What didn't: MeanReversion losing across all versions. CrossVenueArb near-zero trades at our latency. StatsArb killed by incident then zero trades in v4. ClassificationStrategy 50% win rate — signal too weak for fees.
Detailed strategy metrics: 2026-02-strategy-metrics.md
Strategy: Iterate Strategies — ✅¶
Based on actual live results, iterate each strategy to improve profitability.
| Strategy Type | Iteration Focus |
|---|---|
| CrossVenueArb | Execution quality, fill rate, latency optimization |
| SMATrendFollowing | Parameter tuning, signal timing |
| SMAMeanReversion | Entry/exit thresholds, holding period |
| ClassificationStrategy | Investigate 0 trades, signal threshold |
- Analyze live results per strategy
- Identify improvement opportunities for each
- Implement quick iterations based on live data
- Scale strategies that show live profitability
All types iterated (v1→v2/v4). Only SMATrendFollowing profitable.
What's NOT in February¶
- ❌ New strategy types (focus on iterating existing) — Held. No new types added.
- ❌ New exchange integrations — Held.
- ❌ Major ML architecture changes — Partially broken: significant ML pipeline refactoring and volatility modeling work happened. Justified by deploying first ML strategy to live and discovering signal weakness.
- ❌ Infrastructure improvements (unless blocking live trading) — Partially broken: significant engine refactoring (strategy lifecycle, backtest service, past strategy management). Justified by incident response and operational needs.
- ❌ Premature deprecation based on paper results alone — Held. All deprecation was based on live results.
Incident: Cascading Failure (Feb 15-16)¶
Impact: -$39 loss across SMAMeanReversion_v2 and StatsArb_v2.
What happened: OKX returned fills with slight quantity rounding (0.024098 vs 0.0241). Our Order model
rejected these as invalid → fills not processed → state drifted from exchange → risk management over-reacted to
incorrect state and killed strategies → positions abandoned unmanaged for 19h → ETH dropped 5% → force-close at worst
price. System was not recoverable until code fix deployed.
Key lesson: The risk management system caused more damage than the original bug. Killing strategies based on stale state abandons live positions without exit management. Must distinguish data anomalies (flag + alert) from genuine risk events (hard kill).
Fixes shipped: Order model tolerance (≤1%), accounting reservation rebuild on restart, verification no longer auto-closes positions, hard/soft risk block distinction.
Team¶
Gaddafi (6 commits — frontend)¶
- AI agent prototype deployed (Streamlit chat + LlamaIndex RAG + role selector)
- Position entry graph (plotly), fill history traceability, UI fixes
- Company website refurbishment
Vicky (17 commits — ML)¶
- Fixed evaluation pipeline (dedup predictions, dynamic lookback, classification metrics)
- Deployed first ML strategy to production (ClassificationStrategy)
- Volatility modeling: CatBoost regression R² > 0.20, direction accuracy > 0.69
- Two-stage model, baseline pipelines, dataset drift analyzer
- Key finding: kline-only features have weak signal for direction. Volatility as target more promising.
MJ (253 commits — engine, ML, frontend)¶
- Incident response: order tolerance, reservation rebuild, verification fix, hard/soft risk blocks
- Engine: strategy lifecycle simplification, past strategy auto-close, backtest as service, order reconciliation, Python 3.10 upgrade, CI/deploy automation
- Strategy: SMA v2-v4, StatsArb v1-v4, classification tuning
- ML: pipeline refactoring, LSTM/GRU/TCN models, benchmark tooling, endpoint recovery, legacy cleanup
- Frontend: strategy config UI, verification UI, accounting improvements
Phase Assessment¶
Still in Phase 1 (Validate). First profitable month not achieved.
Path forward: cut non-performing strategies, improve winners, avoid incidents.
Learning¶
Risk system over-reaction is dangerous. Killing strategies on bad state abandons positions without exit management. The Feb 15 incident showed that the safety system caused more damage than the original bug. Flag + alert is safer than hard kill when positions are on. Must distinguish data anomalies from genuine risk events.
Strategy iteration has real cost. ~$47 in iteration losses vs ~$3.31 profit this month. Validate in paper
before live where possible.
ML infrastructure ready, signal is not. Pipeline works end-to-end but ~50% accuracy on direction prediction is not actionable. Needs new data sources or different target (volatility looks more promising).
Paper vs live fidelity is strategy-dependent. Trend following was better in live than paper. Cross-venue arb was wildly optimistic in paper. Don't trust paper results uniformly. Live trading drives platform iteration much faster with clearer signal.