claude-code-proxy / CLAUDE.md
Yash030's picture
docs: update CLAUDE.md with auto-routing optimizations
84a115b

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

Free Claude Code is a FastAPI proxy that routes Claude Code's Anthropic Messages API calls to backend providers (NVIDIA NIM, Zen). It translates between client-side Anthropic protocol and provider-specific transports (OpenAI chat format, native APIs), handling SSE streaming, thinking blocks, tool calls, and token usage metadata normalization.

Free Models

Zen/OpenCode (Free Tier)

  • zen/minimax-m2.5-free - Default, Claude Code capable
  • zen/big-pickle - Free tier
  • zen/ring-2.6-1t-free - Free tier
  • zen/nemotron-3-super-free - Free tier

NVIDIA NIM (7 Models)

  • nvidia_nim/qwen/qwen3-coder-480b-a35b-instruct - Code generation
  • nvidia_nim/z-ai/glm4.7 - General purpose
  • nvidia_nim/stepfun-ai/step-3.5-flash - Fast responses
  • nvidia_nim/mistralai/mistral-large-3-675b-instruct-2512 - Reasoning
  • nvidia_nim/abacusai/dracarys-llama-3.1-70b-instruct - Complex tasks
  • nvidia_nim/bytedance/seed-oss-36b-instruct - Balanced
  • nvidia_nim/mistralai/mistral-nemotron - Thinking tasks

Commands

uv run ruff format          # Format code
uv run ruff check           # Lint code
uv run ty check             # Type check
uv run pytest               # Run tests (use -n auto for parallel)
uv run pytest tests/path/test.py::test_name  # Run single test

# Run the proxy
uv run uvicorn server:app --host 0.0.0.0 --port 8082

# Or via installed scripts (after uv tool install)
free-claude-code            # Start proxy with configured host/port
fcc-init                    # Create user config template at ~/.config/free-claude-code/.env

Run format β†’ lint β†’ type check in that order before pushing. CI enforces the same sequence.

Architecture

Request Flow

Claude Code CLI β†’ api/routes.py (FastAPI) β†’ api/model_router.py β†’ providers/* β†’ upstream
                                                    ↓
                                            core/chain_engine.py (fallback)

Auto-Routing with Health Tracking

The proxy includes intelligent model selection:

  1. Pre-flight health check (recent failures in 30s window per model)
  2. Skip unhealthy models (3+ failures = unhealthy for 30s)
  3. Automatic failover on timeout/rate-limit
  4. Zen provider is unlimited (9999 req/min scoped limiter) β€” never blocked by rate limits
  5. Blocked NIM providers skipped silently (no failure penalty)
  6. Load-based ordering β€” least-loaded providers tried first

Key Modules

  • api/routes.py β€” FastAPI routes + REQUESTED_PROVIDER_MODELS list
  • api/services.py β€” Request handling, fallback logic, failure recording
  • api/model_router.py β€” Model resolution with health-aware candidate selection
  • api/optimization_handlers.py β€” Fast-path for trivial requests
  • providers/rate_limit.py β€” GlobalRateLimiter + ModelHealthTracker
  • providers/nvidia_nim/client.py β€” NIM provider with fast timeouts
  • providers/zen/client.py β€” Zen/OpenCode provider
  • providers/openai_compat.py β€” OpenAI chat β†’ Anthropic SSE translation

Provider Model Format

Model values use provider_id/model/name format (e.g., nvidia_nim/z-ai/glm4.7 or zen/minimax-m2.5-free).

Multi-Model Advertisement

MODEL env var accepts comma-separated list to force the Claude CLI to display all models. Each registered model appears in the /model picker. Picker-safe IDs include "(no thinking)" variants that route to the same upstream model while disabling thinking blocks.

Python 3.14 Notes

The except X, Y: syntax is valid in Python 3.14 (reintroduced). Do not modernize this syntax away.

Environment Configuration

Key variables in .env:

  • MODEL β€” Primary model (e.g., zen/minimax-m2.5-free)
  • AUTO_MODEL_ORDER β€” Comma-separated fallback order for auto routing
  • NVIDIA_NIM_API_KEY β€” NVIDIA API key
  • ANTHROPIC_AUTH_TOKEN β€” Auth token (any secret)
  • ENABLE_MODEL_THINKING β€” Enable reasoning blocks

Session Tracking

Start Claude Code with --session-id <uuid> so the admin dashboard shows accurate per-session metrics. The proxy reads the X-Session-ID header for session identification.