2025 - 11¶
Plan 1: Launch Arbitrage Live Trading¶
- Final Safety Validation: Verify risk limits, position sizing, capital allocation
- Live Environment Setup: Configure production API keys, wallet setup, exchange connections
- Deploy to Production: Launch arbitrage strategy with real capital
- Monitor Initial Performance: Track first 48-72 hours of live trading
- Establish Baseline Metrics: Document live P&L, execution costs, latency, error rates
Success Criteria: Arbitrage strategy running live with real capital by Week 1
Progress 1¶
Status: ✅ Launched with Issues
Metrics:
| Metric | Value | Note |
|---|---|---|
| Total P&L | $-8.36 | Small loss, capital preserved |
| NAV | $8,807.75 | 99.9% capital retained |
| Avg Win Rate | 23.0% | Below target |
| Avg Sharpe | 0.19 | Low but positive |
| Max Drawdown | 0.0% | No drawdowns |
| Total Exposure | $0.62 | Conservative exposure |
Strategy Performance:
| Strategy | Status | P&L | Win Rate | Sharpe | Trades |
|---|---|---|---|---|---|
| v0 | KILLED | $-7.45 | 0.0% | N/A | 1 |
| v1 | CLOSE_ONLY | $-2.41 | 35.7% | -0.00 | 14 |
| v2 | CLOSE_ONLY | $-0.46 | 33.3% | 0.38 | 3 |
Details:
- 18 total trades executed across all strategies
- Dashboard summary metrics were misleading (showed 0% fill rate)
- v2 strategy achieved positive Sharpe (0.38), showing core logic works
- Both Binance and Hyperliquid gateways operational with 0 errors
Key Finding: High latency (~1M ms) hurting arbitrage profitability. Cross-venue arbitrage requires faster execution - current Binance + Hyperliquid setup too slow for millisecond opportunities.
Plan 2: ML Development¶
- Deploy Dual-Model: Launch ternary classification with regime awareness
- [i] Improve Regression Performance: Achieve MAE ≤10 bips, MAPE ≤20%, R² >0, 65%+ directional accuracy
- [i] Feature Engineering: Optimize feature set for model generalization
- Endpoint Deployment: Re-deploy endpoints without duplicate pipelines
- Metrics Calibration: Finalize confidence thresholds and quality assessment
Success Criteria: ML model meets target performance metrics
Progress 2¶
Status: ✅ Architecture Complete, ❌ Performance Gap
Metrics:
| Metric | Target | Test Result | Status |
|---|---|---|---|
| Direction Accuracy | 65%+ | 43.2% | ❌ Gap: -22% |
| R² | >0 | -1.63 | ❌ Negative |
| MAPE | ≤20% | 4.85% | ✅ Met |
| Correlation | >0.3 | 0.07 | ❌ Low |
Details - Architecture Completed:
- Full ternary classification (−1/0/+1) with flat class now predicted
- Class-weighted CE loss + Focal loss + label smoothing for imbalance
- Weighted Expectile Loss for scale prediction
- Time-series CV (Blocked K-Fold, Growing Window)
- Regime-aware dual model with quantile thresholds
- GradNorm for multitask training stability
- Model-agnostic feature engineering pipeline
Key Finding: Severe overfitting (train: 49.5% → test: 43.2%). Model architecture is solid but not generalizing. Need regularization and simpler models, not more features.
Plan 3: Post-Launch Infrastructure¶
- Analyze live arbitrage performance vs paper trading
- Identify and fix production-specific issues
- Optimize execution based on real market data
- Scale capital allocation if performance validates
- [i] Frontend Observability: Improve dashboard for strategy observability
- [i] Backtest System: Build framework for faster strategy iteration
Success Criteria: Live arbitrage profitable, backtest WIP, foundations for new strategies
Progress 3¶
Status: 🔄 In Progress
Completed:
- Live performance analysis completed (identified latency as bottleneck)
- Production-specific issues identified and documented
- Execution optimization attempted (limited by current infrastructure)
In Progress:
- Frontend observability improvements ongoing
- Backtest system development started
Not Started:
- Capital scaling (blocked by profitability requirement)
- TA indicators strategy (deferred to December)
- ML-based strategy (blocked by ML performance)
Key Finding: Dashboard summary metrics were misleading - always drill into detail views. Frontend observability needs improvement for accurate real-time monitoring.
Learning¶
Live Trading Launched - Major Milestone Achieved
After four months of development (August-November), we launched live trading with real capital. This validates October's learning about shifting from "making it better" to "making it live". The system is running on both Binance and Hyperliquid with real money and executing trades.
Live Trading Is Different From Paper Trading
Four months of development culminated in live trading launch. The experience immediately revealed issues invisible in simulation - misleading dashboard metrics, latency problems, and execution challenges that only appear with real money. Every month spent in paper trading delayed learning these critical production lessons.
Arbitrage Strategy Needs Lower Latency
Despite reasonable win rates (v1: 35.7%, v2: 33.3%), high latency is hurting profitability. Cross-venue arbitrage requires fast execution - opportunities disappear in milliseconds. Current Binance + Hyperliquid setup may be too slow. Consider simpler exchange pairs or single-venue strategies that don't require split-second timing.
Capital Preservation Worked
Despite suboptimal conditions, we only lost $10.32 from $8,807.75 capital (0.1% loss) with 0% max drawdown. Risk management systems protected capital. v2 strategy even achieved positive Sharpe (0.38), showing the core logic can work with better execution.
ML Architecture ≠ ML Performance
Extensive architecture work (ternary classification, regime awareness, GradNorm, sentiment features) didn't solve the core problem: overfitting. Adding complexity without improving generalization wastes effort. December must focus on making existing architecture generalize, not building more features.
For December: Evaluate lower-latency exchange options. Develop single-venue strategies matching current latency capabilities. Fix ML overfitting through regularization. Finish year with profitable trading.