2026 - 02¶

Goal: Validate paper trading results in live and achieve first profitable month. | Phase: 1 (Validate)

Strategy: Live Calibration Experiment — ✅¶

Launch ALL strategies to live as time-bounded experiment to verify paper vs live consistency.

Deploy all strategies to live with identical parameters as paper
Run paper and live simultaneously for comparison
Compare metrics per strategy: P&L, fill rate, slippage, latency
Document paper vs live discrepancies for each strategy type

Constraints: 2-week calibration phase, small scale per strategy, halt strategy if live loss exceeds threshold.

Paper vs live consistency validated — varies significantly by strategy type. Fidelity is strategy-dependent: trend following was better in live than paper; cross-venue arb was wildly optimistic in paper. Don't trust paper results uniformly.

Strategy: First Profitable Live Month — ❌¶

Achieve positive P&L in live trading for February.

At least one strategy profitable in live
Net positive P&L across all live strategies
Understand why profitable (not luck)

Net P&L: ~-$47 across all versions. Bulk from Feb 15 incident (-$39).

What worked: SMATrendFollowing +$3.31 (ETH + BTC). Live execution confirmed feasible. Platform operationally proven.

What didn't: MeanReversion losing across all versions. CrossVenueArb near-zero trades at our latency. StatsArb killed by incident then zero trades in v4. ClassificationStrategy 50% win rate — signal too weak for fees.

Detailed strategy metrics: 2026-02-strategy-metrics.md

Strategy: Iterate Strategies — ✅¶

Based on actual live results, iterate each strategy to improve profitability.

Strategy Type	Iteration Focus
CrossVenueArb	Execution quality, fill rate, latency optimization
SMATrendFollowing	Parameter tuning, signal timing
SMAMeanReversion	Entry/exit thresholds, holding period
ClassificationStrategy	Investigate 0 trades, signal threshold

Analyze live results per strategy
Identify improvement opportunities for each
Implement quick iterations based on live data
Scale strategies that show live profitability

All types iterated (v1→v2/v4). Only SMATrendFollowing profitable.

What's NOT in February¶

❌ New strategy types (focus on iterating existing) — Held. No new types added.
❌ New exchange integrations — Held.
❌ Major ML architecture changes — Partially broken: significant ML pipeline refactoring and volatility modeling work happened. Justified by deploying first ML strategy to live and discovering signal weakness.
❌ Infrastructure improvements (unless blocking live trading) — Partially broken: significant engine refactoring (strategy lifecycle, backtest service, past strategy management). Justified by incident response and operational needs.
❌ Premature deprecation based on paper results alone — Held. All deprecation was based on live results.

Incident: Cascading Failure (Feb 15-16)¶

Impact: -$39 loss across SMAMeanReversion_v2 and StatsArb_v2.

What happened: OKX returned fills with slight quantity rounding (0.024098 vs 0.0241). Our Order model rejected these as invalid → fills not processed → state drifted from exchange → risk management over-reacted to incorrect state and killed strategies → positions abandoned unmanaged for 19h → ETH dropped 5% → force-close at worst price. System was not recoverable until code fix deployed.

Key lesson: The risk management system caused more damage than the original bug. Killing strategies based on stale state abandons live positions without exit management. Must distinguish data anomalies (flag + alert) from genuine risk events (hard kill).

Fixes shipped: Order model tolerance (≤1%), accounting reservation rebuild on restart, verification no longer auto-closes positions, hard/soft risk block distinction.

Team¶

Gaddafi (6 commits — frontend)¶

AI agent prototype deployed (Streamlit chat + LlamaIndex RAG + role selector)
Position entry graph (plotly), fill history traceability, UI fixes
Company website refurbishment

Vicky (17 commits — ML)¶

Fixed evaluation pipeline (dedup predictions, dynamic lookback, classification metrics)
Deployed first ML strategy to production (ClassificationStrategy)
Volatility modeling: CatBoost regression R² > 0.20, direction accuracy > 0.69
Two-stage model, baseline pipelines, dataset drift analyzer
Key finding: kline-only features have weak signal for direction. Volatility as target more promising.

MJ (253 commits — engine, ML, frontend)¶

Incident response: order tolerance, reservation rebuild, verification fix, hard/soft risk blocks
Engine: strategy lifecycle simplification, past strategy auto-close, backtest as service, order reconciliation, Python 3.10 upgrade, CI/deploy automation
Strategy: SMA v2-v4, StatsArb v1-v4, classification tuning
ML: pipeline refactoring, LSTM/GRU/TCN models, benchmark tooling, endpoint recovery, legacy cleanup
Frontend: strategy config UI, verification UI, accounting improvements

Phase Assessment¶

Still in Phase 1 (Validate). First profitable month not achieved.

Path forward: cut non-performing strategies, improve winners, avoid incidents.

Learning¶

Risk system over-reaction is dangerous. Killing strategies on bad state abandons positions without exit management. The Feb 15 incident showed that the safety system caused more damage than the original bug. Flag + alert is safer than hard kill when positions are on. Must distinguish data anomalies from genuine risk events.

Strategy iteration has real cost. ~$47 in iteration losses vs ~$3.31 profit this month. Validate in paper before live where possible.

ML infrastructure ready, signal is not. Pipeline works end-to-end but ~50% accuracy on direction prediction is not actionable. Needs new data sources or different target (volatility looks more promising).

Paper vs live fidelity is strategy-dependent. Trend following was better in live than paper. Cross-venue arb was wildly optimistic in paper. Don't trust paper results uniformly. Live trading drives platform iteration much faster with clearer signal.