Spaces:
Running
Running
docs: update CLAUDE.md with auto-routing optimizations
Browse files- Zen unlimited rate limit (9999 req/min)
- Silent blocked-skip for NIM providers
- Load-based candidate ordering
- Session tracking via X-Session-ID header
- Fix AUTO_MODEL_PRIORITY -> AUTO_MODEL_ORDER
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CLAUDE.md
CHANGED
|
@@ -53,10 +53,12 @@ Claude Code CLI β api/routes.py (FastAPI) β api/model_router.py β provider
|
|
| 53 |
|
| 54 |
### Auto-Routing with Health Tracking
|
| 55 |
The proxy includes intelligent model selection:
|
| 56 |
-
1. Pre-flight health check (recent failures in 30s window)
|
| 57 |
2. Skip unhealthy models (3+ failures = unhealthy for 30s)
|
| 58 |
3. Automatic failover on timeout/rate-limit
|
| 59 |
-
4.
|
|
|
|
|
|
|
| 60 |
|
| 61 |
### Key Modules
|
| 62 |
|
|
@@ -83,7 +85,10 @@ The `except X, Y:` syntax is valid in Python 3.14 (reintroduced). Do not moderni
|
|
| 83 |
|
| 84 |
Key variables in `.env`:
|
| 85 |
- `MODEL` β Primary model (e.g., `zen/minimax-m2.5-free`)
|
| 86 |
-
- `
|
| 87 |
- `NVIDIA_NIM_API_KEY` β NVIDIA API key
|
| 88 |
- `ANTHROPIC_AUTH_TOKEN` β Auth token (any secret)
|
| 89 |
-
- `ENABLE_MODEL_THINKING` β Enable reasoning blocks
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
### Auto-Routing with Health Tracking
|
| 55 |
The proxy includes intelligent model selection:
|
| 56 |
+
1. Pre-flight health check (recent failures in 30s window per model)
|
| 57 |
2. Skip unhealthy models (3+ failures = unhealthy for 30s)
|
| 58 |
3. Automatic failover on timeout/rate-limit
|
| 59 |
+
4. Zen provider is unlimited (9999 req/min scoped limiter) β never blocked by rate limits
|
| 60 |
+
5. Blocked NIM providers skipped silently (no failure penalty)
|
| 61 |
+
6. Load-based ordering β least-loaded providers tried first
|
| 62 |
|
| 63 |
### Key Modules
|
| 64 |
|
|
|
|
| 85 |
|
| 86 |
Key variables in `.env`:
|
| 87 |
- `MODEL` β Primary model (e.g., `zen/minimax-m2.5-free`)
|
| 88 |
+
- `AUTO_MODEL_ORDER` β Comma-separated fallback order for auto routing
|
| 89 |
- `NVIDIA_NIM_API_KEY` β NVIDIA API key
|
| 90 |
- `ANTHROPIC_AUTH_TOKEN` β Auth token (any secret)
|
| 91 |
+
- `ENABLE_MODEL_THINKING` β Enable reasoning blocks
|
| 92 |
+
|
| 93 |
+
### Session Tracking
|
| 94 |
+
Start Claude Code with `--session-id <uuid>` so the admin dashboard shows accurate per-session metrics. The proxy reads the `X-Session-ID` header for session identification.
|