Spaces:

Yash030
/

claude-code-proxy

Running

App Files Files Community

claude-code-proxy / CLAUDE.md

Yash030

docs: update CLAUDE.md with auto-routing optimizations

84a115b about 22 hours ago

preview code

raw

history blame contribute delete

4.25 kB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

Free Claude Code is a FastAPI proxy that routes Claude Code's Anthropic Messages API calls to backend providers (NVIDIA NIM, Zen). It translates between client-side Anthropic protocol and provider-specific transports (OpenAI chat format, native APIs), handling SSE streaming, thinking blocks, tool calls, and token usage metadata normalization.

Free Models

Zen/OpenCode (Free Tier)

zen/minimax-m2.5-free - Default, Claude Code capable
zen/big-pickle - Free tier
zen/ring-2.6-1t-free - Free tier
zen/nemotron-3-super-free - Free tier

NVIDIA NIM (7 Models)

nvidia_nim/qwen/qwen3-coder-480b-a35b-instruct - Code generation
nvidia_nim/z-ai/glm4.7 - General purpose
nvidia_nim/stepfun-ai/step-3.5-flash - Fast responses
nvidia_nim/mistralai/mistral-large-3-675b-instruct-2512 - Reasoning
nvidia_nim/abacusai/dracarys-llama-3.1-70b-instruct - Complex tasks
nvidia_nim/bytedance/seed-oss-36b-instruct - Balanced
nvidia_nim/mistralai/mistral-nemotron - Thinking tasks

Commands

uv run ruff format          # Format code
uv run ruff check           # Lint code
uv run ty check             # Type check
uv run pytest               # Run tests (use -n auto for parallel)
uv run pytest tests/path/test.py::test_name  # Run single test

# Run the proxy
uv run uvicorn server:app --host 0.0.0.0 --port 8082

# Or via installed scripts (after uv tool install)
free-claude-code            # Start proxy with configured host/port
fcc-init                    # Create user config template at ~/.config/free-claude-code/.env

Run format → lint → type check in that order before pushing. CI enforces the same sequence.

Architecture

Request Flow

Claude Code CLI → api/routes.py (FastAPI) → api/model_router.py → providers/* → upstream
                                                    ↓
                                            core/chain_engine.py (fallback)

Auto-Routing with Health Tracking

The proxy includes intelligent model selection:

Pre-flight health check (recent failures in 30s window per model)
Skip unhealthy models (3+ failures = unhealthy for 30s)
Automatic failover on timeout/rate-limit
Zen provider is unlimited (9999 req/min scoped limiter) — never blocked by rate limits
Blocked NIM providers skipped silently (no failure penalty)
Load-based ordering — least-loaded providers tried first

Key Modules

api/routes.py — FastAPI routes + REQUESTED_PROVIDER_MODELS list
api/services.py — Request handling, fallback logic, failure recording
api/model_router.py — Model resolution with health-aware candidate selection
api/optimization_handlers.py — Fast-path for trivial requests
providers/rate_limit.py — GlobalRateLimiter + ModelHealthTracker
providers/nvidia_nim/client.py — NIM provider with fast timeouts
providers/zen/client.py — Zen/OpenCode provider
providers/openai_compat.py — OpenAI chat → Anthropic SSE translation

Provider Model Format

Model values use provider_id/model/name format (e.g., nvidia_nim/z-ai/glm4.7 or zen/minimax-m2.5-free).

Multi-Model Advertisement

MODEL env var accepts comma-separated list to force the Claude CLI to display all models. Each registered model appears in the /model picker. Picker-safe IDs include "(no thinking)" variants that route to the same upstream model while disabling thinking blocks.

Python 3.14 Notes

The except X, Y: syntax is valid in Python 3.14 (reintroduced). Do not modernize this syntax away.

Environment Configuration

Key variables in .env:

MODEL — Primary model (e.g., zen/minimax-m2.5-free)
AUTO_MODEL_ORDER — Comma-separated fallback order for auto routing
NVIDIA_NIM_API_KEY — NVIDIA API key
ANTHROPIC_AUTH_TOKEN — Auth token (any secret)
ENABLE_MODEL_THINKING — Enable reasoning blocks

Session Tracking

Start Claude Code with --session-id <uuid> so the admin dashboard shows accurate per-session metrics. The proxy reads the X-Session-ID header for session identification.