Yash030 Claude Opus 4.7 commited on
Commit
84a115b
Β·
1 Parent(s): 9358a6f

docs: update CLAUDE.md with auto-routing optimizations

Browse files

- Zen unlimited rate limit (9999 req/min)
- Silent blocked-skip for NIM providers
- Load-based candidate ordering
- Session tracking via X-Session-ID header
- Fix AUTO_MODEL_PRIORITY -> AUTO_MODEL_ORDER

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (1) hide show
  1. CLAUDE.md +9 -4
CLAUDE.md CHANGED
@@ -53,10 +53,12 @@ Claude Code CLI β†’ api/routes.py (FastAPI) β†’ api/model_router.py β†’ provider
53
 
54
  ### Auto-Routing with Health Tracking
55
  The proxy includes intelligent model selection:
56
- 1. Pre-flight health check (recent failures in 30s window)
57
  2. Skip unhealthy models (3+ failures = unhealthy for 30s)
58
  3. Automatic failover on timeout/rate-limit
59
- 4. 40 req/min rate limit respected
 
 
60
 
61
  ### Key Modules
62
 
@@ -83,7 +85,10 @@ The `except X, Y:` syntax is valid in Python 3.14 (reintroduced). Do not moderni
83
 
84
  Key variables in `.env`:
85
  - `MODEL` β€” Primary model (e.g., `zen/minimax-m2.5-free`)
86
- - `AUTO_MODEL_PRIORITY` β€” Comma-separated fallback order
87
  - `NVIDIA_NIM_API_KEY` β€” NVIDIA API key
88
  - `ANTHROPIC_AUTH_TOKEN` β€” Auth token (any secret)
89
- - `ENABLE_MODEL_THINKING` β€” Enable reasoning blocks
 
 
 
 
53
 
54
  ### Auto-Routing with Health Tracking
55
  The proxy includes intelligent model selection:
56
+ 1. Pre-flight health check (recent failures in 30s window per model)
57
  2. Skip unhealthy models (3+ failures = unhealthy for 30s)
58
  3. Automatic failover on timeout/rate-limit
59
+ 4. Zen provider is unlimited (9999 req/min scoped limiter) β€” never blocked by rate limits
60
+ 5. Blocked NIM providers skipped silently (no failure penalty)
61
+ 6. Load-based ordering β€” least-loaded providers tried first
62
 
63
  ### Key Modules
64
 
 
85
 
86
  Key variables in `.env`:
87
  - `MODEL` β€” Primary model (e.g., `zen/minimax-m2.5-free`)
88
+ - `AUTO_MODEL_ORDER` β€” Comma-separated fallback order for auto routing
89
  - `NVIDIA_NIM_API_KEY` β€” NVIDIA API key
90
  - `ANTHROPIC_AUTH_TOKEN` β€” Auth token (any secret)
91
+ - `ENABLE_MODEL_THINKING` β€” Enable reasoning blocks
92
+
93
+ ### Session Tracking
94
+ Start Claude Code with `--session-id <uuid>` so the admin dashboard shows accurate per-session metrics. The proxy reads the `X-Session-ID` header for session identification.