Spaces:
Running
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Overview
Free Claude Code is a FastAPI proxy that routes Claude Code's Anthropic Messages API calls to backend providers (NVIDIA NIM, Zen). It translates between client-side Anthropic protocol and provider-specific transports (OpenAI chat format, native APIs), handling SSE streaming, thinking blocks, tool calls, and token usage metadata normalization.
Free Models
Zen/OpenCode (Free Tier)
zen/minimax-m2.5-free- Default, Claude Code capablezen/big-pickle- Free tierzen/ring-2.6-1t-free- Free tierzen/nemotron-3-super-free- Free tier
NVIDIA NIM (7 Models)
nvidia_nim/qwen/qwen3-coder-480b-a35b-instruct- Code generationnvidia_nim/z-ai/glm4.7- General purposenvidia_nim/stepfun-ai/step-3.5-flash- Fast responsesnvidia_nim/mistralai/mistral-large-3-675b-instruct-2512- Reasoningnvidia_nim/abacusai/dracarys-llama-3.1-70b-instruct- Complex tasksnvidia_nim/bytedance/seed-oss-36b-instruct- Balancednvidia_nim/mistralai/mistral-nemotron- Thinking tasks
Commands
uv run ruff format # Format code
uv run ruff check # Lint code
uv run ty check # Type check
uv run pytest # Run tests (use -n auto for parallel)
uv run pytest tests/path/test.py::test_name # Run single test
# Run the proxy
uv run uvicorn server:app --host 0.0.0.0 --port 8082
# Or via installed scripts (after uv tool install)
free-claude-code # Start proxy with configured host/port
fcc-init # Create user config template at ~/.config/free-claude-code/.env
Run format β lint β type check in that order before pushing. CI enforces the same sequence.
Architecture
Request Flow
Claude Code CLI β api/routes.py (FastAPI) β api/model_router.py β providers/* β upstream
β
core/chain_engine.py (fallback)
Auto-Routing with Health Tracking
The proxy includes intelligent model selection:
- Pre-flight health check (recent failures in 30s window per model)
- Skip unhealthy models (3+ failures = unhealthy for 30s)
- Automatic failover on timeout/rate-limit
- Zen provider is unlimited (9999 req/min scoped limiter) β never blocked by rate limits
- Blocked NIM providers skipped silently (no failure penalty)
- Load-based ordering β least-loaded providers tried first
Key Modules
- api/routes.py β FastAPI routes + REQUESTED_PROVIDER_MODELS list
- api/services.py β Request handling, fallback logic, failure recording
- api/model_router.py β Model resolution with health-aware candidate selection
- api/optimization_handlers.py β Fast-path for trivial requests
- providers/rate_limit.py β GlobalRateLimiter + ModelHealthTracker
- providers/nvidia_nim/client.py β NIM provider with fast timeouts
- providers/zen/client.py β Zen/OpenCode provider
- providers/openai_compat.py β OpenAI chat β Anthropic SSE translation
Provider Model Format
Model values use provider_id/model/name format (e.g., nvidia_nim/z-ai/glm4.7 or zen/minimax-m2.5-free).
Multi-Model Advertisement
MODEL env var accepts comma-separated list to force the Claude CLI to display all models. Each registered model appears in the /model picker. Picker-safe IDs include "(no thinking)" variants that route to the same upstream model while disabling thinking blocks.
Python 3.14 Notes
The except X, Y: syntax is valid in Python 3.14 (reintroduced). Do not modernize this syntax away.
Environment Configuration
Key variables in .env:
MODELβ Primary model (e.g.,zen/minimax-m2.5-free)AUTO_MODEL_ORDERβ Comma-separated fallback order for auto routingNVIDIA_NIM_API_KEYβ NVIDIA API keyANTHROPIC_AUTH_TOKENβ Auth token (any secret)ENABLE_MODEL_THINKINGβ Enable reasoning blocks
Session Tracking
Start Claude Code with --session-id <uuid> so the admin dashboard shows accurate per-session metrics. The proxy reads the X-Session-ID header for session identification.