Skip to content

2026 - 02

Goal: Validate paper trading results in live and achieve first profitable month. | Phase: 1 (Validate)

Strategy: Live Calibration Experiment — ✅

Launch ALL strategies to live as time-bounded experiment to verify paper vs live consistency.

  • Deploy all strategies to live with identical parameters as paper
  • Run paper and live simultaneously for comparison
  • Compare metrics per strategy: P&L, fill rate, slippage, latency
  • Document paper vs live discrepancies for each strategy type

Constraints: 2-week calibration phase, small scale per strategy, halt strategy if live loss exceeds threshold.

Paper vs live consistency validated — varies significantly by strategy type. Fidelity is strategy-dependent: trend following was better in live than paper; cross-venue arb was wildly optimistic in paper. Don't trust paper results uniformly.

Strategy: First Profitable Live Month — ❌

Achieve positive P&L in live trading for February.

  • At least one strategy profitable in live
  • Net positive P&L across all live strategies
  • Understand why profitable (not luck)

Net P&L: ~-$47 across all versions. Bulk from Feb 15 incident (-$39).

What worked: SMATrendFollowing +$3.31 (ETH + BTC). Live execution confirmed feasible. Platform operationally proven.

What didn't: MeanReversion losing across all versions. CrossVenueArb near-zero trades at our latency. StatsArb killed by incident then zero trades in v4. ClassificationStrategy 50% win rate — signal too weak for fees.

Detailed strategy metrics: 2026-02-strategy-metrics.md

Strategy: Iterate Strategies — ✅

Based on actual live results, iterate each strategy to improve profitability.

Strategy Type Iteration Focus
CrossVenueArb Execution quality, fill rate, latency optimization
SMATrendFollowing Parameter tuning, signal timing
SMAMeanReversion Entry/exit thresholds, holding period
ClassificationStrategy Investigate 0 trades, signal threshold
  • Analyze live results per strategy
  • Identify improvement opportunities for each
  • Implement quick iterations based on live data
  • Scale strategies that show live profitability

All types iterated (v1→v2/v4). Only SMATrendFollowing profitable.

What's NOT in February

  • ❌ New strategy types (focus on iterating existing) — Held. No new types added.
  • ❌ New exchange integrations — Held.
  • ❌ Major ML architecture changes — Partially broken: significant ML pipeline refactoring and volatility modeling work happened. Justified by deploying first ML strategy to live and discovering signal weakness.
  • ❌ Infrastructure improvements (unless blocking live trading) — Partially broken: significant engine refactoring (strategy lifecycle, backtest service, past strategy management). Justified by incident response and operational needs.
  • ❌ Premature deprecation based on paper results alone — Held. All deprecation was based on live results.

Incident: Cascading Failure (Feb 15-16)

Impact: -$39 loss across SMAMeanReversion_v2 and StatsArb_v2.

What happened: OKX returned fills with slight quantity rounding (0.024098 vs 0.0241). Our Order model rejected these as invalid → fills not processed → state drifted from exchange → risk management over-reacted to incorrect state and killed strategies → positions abandoned unmanaged for 19h → ETH dropped 5% → force-close at worst price. System was not recoverable until code fix deployed.

Key lesson: The risk management system caused more damage than the original bug. Killing strategies based on stale state abandons live positions without exit management. Must distinguish data anomalies (flag + alert) from genuine risk events (hard kill).

Fixes shipped: Order model tolerance (≤1%), accounting reservation rebuild on restart, verification no longer auto-closes positions, hard/soft risk block distinction.


Team

Gaddafi (6 commits — frontend)

  • AI agent prototype deployed (Streamlit chat + LlamaIndex RAG + role selector)
  • Position entry graph (plotly), fill history traceability, UI fixes
  • Company website refurbishment

Vicky (17 commits — ML)

  • Fixed evaluation pipeline (dedup predictions, dynamic lookback, classification metrics)
  • Deployed first ML strategy to production (ClassificationStrategy)
  • Volatility modeling: CatBoost regression R² > 0.20, direction accuracy > 0.69
  • Two-stage model, baseline pipelines, dataset drift analyzer
  • Key finding: kline-only features have weak signal for direction. Volatility as target more promising.

MJ (253 commits — engine, ML, frontend)

  • Incident response: order tolerance, reservation rebuild, verification fix, hard/soft risk blocks
  • Engine: strategy lifecycle simplification, past strategy auto-close, backtest as service, order reconciliation, Python 3.10 upgrade, CI/deploy automation
  • Strategy: SMA v2-v4, StatsArb v1-v4, classification tuning
  • ML: pipeline refactoring, LSTM/GRU/TCN models, benchmark tooling, endpoint recovery, legacy cleanup
  • Frontend: strategy config UI, verification UI, accounting improvements

Phase Assessment

Still in Phase 1 (Validate). First profitable month not achieved.

Path forward: cut non-performing strategies, improve winners, avoid incidents.

Learning

Risk system over-reaction is dangerous. Killing strategies on bad state abandons positions without exit management. The Feb 15 incident showed that the safety system caused more damage than the original bug. Flag + alert is safer than hard kill when positions are on. Must distinguish data anomalies from genuine risk events.

Strategy iteration has real cost. ~$47 in iteration losses vs ~$3.31 profit this month. Validate in paper before live where possible.

ML infrastructure ready, signal is not. Pipeline works end-to-end but ~50% accuracy on direction prediction is not actionable. Needs new data sources or different target (volatility looks more promising).

Paper vs live fidelity is strategy-dependent. Trend following was better in live than paper. Cross-venue arb was wildly optimistic in paper. Don't trust paper results uniformly. Live trading drives platform iteration much faster with clearer signal.