Spaces:

Yash030
/

claude-code-proxy

Running

Yash030 Claude Opus 4.7 commited on 2 days ago

Commit

fcc5278

1 Parent(s): ebba9d6

docs: complete README refactor with cloud deploy guide

- Restructured to lead with the problem/solution approach
- Added HuggingFace Spaces as primary deployment method
- Documented all deployment options (HF, Railway, Render, Fly.io, Docker)
- Added architecture overview diagram
- Simplified troubleshooting section
- Updated model list with speed ratings
- Added visual ASCII diagram for auto-routing flow

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (1) hide show

README.md +205 -393

README.md CHANGED Viewed

@@ -14,508 +14,320 @@ pinned: false
 # 🤖 Free Claude Code
-Use Claude Code CLI, VS Code, JetBrains ACP, or chat bots through your own Anthropic-compatible proxy.
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
 [![Python 3.14](https://img.shields.io/badge/python-3.14-3776ab.svg?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/downloads/)
-[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json&style=for-the-badge)](https://github.com/astral-sh/uv)
-[![Tested with Pytest](https://img.shields.io/badge/testing-Pytest-00c0ff.svg?style=for-the-badge)](https://github.com/Alishahryar1/free-claude-code/actions/workflows/tests.yml)
-[![Type checking: Ty](https://img.shields.io/badge/type%20checking-ty-ffcc00.svg?style=for-the-badge)](https://pypi.org/project/ty/)
 [![Code style: Ruff](https://img.shields.io/badge/code%20formatting-ruff-f5a623.svg?style=for-the-badge)](https://github.com/astral-sh/ruff)
-[![Logging: Loguru](https://img.shields.io/badge/logging-loguru-4ecdc4.svg?style=for-the-badge)](https://github.com/Delgan/loguru)
-Free Claude Code routes Anthropic Messages API traffic from Claude Code to NVIDIA NIM. It keeps Claude Code's client-side protocol stable while letting you use NVIDIA's free models.
-## Git Origins
-This project is synchronized between two repositories:
-| Platform | URL |
-|----------|-----|
-| **Hugging Face Spaces** | [huggingface.co/spaces/Yash030/claude-code-proxy](https://huggingface.co/spaces/Yash030/claude-code-proxy) |
-| **GitHub** | [github.com/Yashwant00CR7/claude-code-nvidia](https://github.com/Yashwant00CR7/claude-code-nvidia) |
-[Quick Start](#quick-start) · [Providers](#choose-a-provider) · [Clients](#connect-claude-code) · [Troubleshooting](#troubleshooting) · [Development](#development)
-</div>
-<div align="center">
-  <img src="pic.png" alt="Free Claude Code in action" width="700">
-</div>
-## What You Get
-- Drop-in proxy for Claude Code's Anthropic API calls.
-- NVIDIA NIM provider backend with free models.
-- Per-model routing: send Opus, Sonnet, Haiku, and fallback traffic to different NVIDIA NIM models.
-- Native Claude Code `/model` picker support through the proxy's `/v1/models` endpoint.
-- Streaming, tool use, reasoning/thinking block handling, and local request optimizations.
-- Optional Discord or Telegram bot wrapper for remote coding sessions.
-- Optional voice-note transcription through local Whisper or NVIDIA NIM.
-## Quick Start
-### 1. Install Requirements
-Install [Claude Code](https://github.com/anthropics/claude-code), then install `uv` and Python 3.14.
-macOS/Linux:
 ```bash
-curl -LsSf https://astral.sh/uv/install.sh | sh
-uv self update
-uv python install 3.14
 ```
-Windows PowerShell:
-```powershell
-powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-uv self update
-uv python install 3.14
-```
-### 2. Clone And Configure
 ```bash
-git clone https://github.com/Alishahryar1/free-claude-code.git
-cd free-claude-code
-cp .env.example .env
 ```
-PowerShell uses:
-```powershell
-Copy-Item .env.example .env
 ```
-Edit `.env` and choose one provider. For the default NVIDIA NIM path:
 ```dotenv
 NVIDIA_NIM_API_KEY="nvapi-your-key"
-MODEL="nvidia_nim/z-ai/glm4.7"
 ANTHROPIC_AUTH_TOKEN="freecc"
 ```
-Use any local secret for `ANTHROPIC_AUTH_TOKEN`; Claude Code will send the same value back to this proxy. Leave it empty only for local/private testing.
-### 3. Start The Proxy
 ```bash
 uv run uvicorn server:app --host 0.0.0.0 --port 8082
 ```
-Package install alternative:
 ```bash
-uv tool install git+https://github.com/Alishahryar1/free-claude-code.git
-fcc-init
-free-claude-code
 ```
-`fcc-init` creates `~/.config/free-claude-code/.env` from the bundled template.
-### 4. Run Claude Code
-Point `ANTHROPIC_BASE_URL` at the proxy root. Do not append `/v1`.
-PowerShell:
-```powershell
-$env:ANTHROPIC_AUTH_TOKEN="freecc"; $env:ANTHROPIC_BASE_URL="http://localhost:8082"; claude
-```
-Bash:
-```bash
-ANTHROPIC_AUTH_TOKEN="freecc" ANTHROPIC_BASE_URL="http://localhost:8082" claude
 ```
-## Choose A Provider
-Model values use this format:
-```text
-provider_id/model/name
 ```
-`MODEL` is the fallback. `MODEL_OPUS`, `MODEL_SONNET`, and `MODEL_HAIKU` override routing for requests that Claude Code sends for those tiers.
-| Provider | Prefix | Transport | Key | Default base URL |
-| --- | --- | --- | --- | --- |
-| <img src="https://cdn.simpleicons.org/nvidia/76B900" alt="" width="18" height="18"> NVIDIA NIM | `nvidia_nim/...` | OpenAI chat translation | `NVIDIA_NIM_API_KEY` | `https://integrate.api.nvidia.com/v1` |
-| <img src="https://cdn.simpleicons.org/groq/F55036" alt="" width="18" height="18"> Groq | `groq/...` | OpenAI chat translation | `GROQ_API_KEY` | `https://api.groq.com/openai/v1` |
-| <img src="https://cdn.simpleicons.org/cerebras/313131" alt="" width="18" height="18"> Cerebras | `cerebras/...` | OpenAI chat translation | `CEREBRAS_API_KEY` | `https://api.cerebras.ai/v1` |
-<details>
-<summary><img src="https://cdn.simpleicons.org/nvidia/76B900" alt="" width="18" height="18"> <b>NVIDIA NIM</b></summary>
-Get a key at [build.nvidia.com/settings/api-keys](https://build.nvidia.com/settings/api-keys).
 ```dotenv
-NVIDIA_NIM_API_KEY="nvapi-your-key"
-MODEL="nvidia_nim/z-ai/glm4.7"
 ```
-Popular examples:
-- `nvidia_nim/qwen/qwen3-coder-480b-a35b-instruct`
-- `nvidia_nim/mistralai/mistral-large-3-675b-instruct-2512`
-- `nvidia_nim/z-ai/glm4.7`
-</details>
-<details>
-<summary><img src="https://cdn.simpleicons.org/groq/F55036" alt="" width="18" height="18"> <b>Groq</b></summary>
-Get a key at [console.groq.com/keys](https://console.groq.com/keys).
 ```dotenv
-GROQ_API_KEY="gsk_..."
-MODEL="groq/openai/gpt-oss-120b"
-```
-Popular examples:
-- `groq/openai/gpt-oss-120b` (Best overall for Claude Code)
-- `groq/openai/gpt-oss-20b` (Ultra-low latency)
-- `groq/llama-3.3-70b-versatile`
-</details>
-<details>
-<summary><img src="https://cdn.simpleicons.org/cerebras/313131" alt="" width="18" height="18"> <b>Cerebras</b></summary>
-Get a key at [cloud.cerebras.ai](https://cloud.cerebras.ai/).
-```dotenv
-CEREBRAS_API_KEY="csk_..."
-MODEL="cerebras/gpt-oss-120b"
 ```
-Popular examples:
-- `cerebras/gpt-oss-120b` (~3000 tok/s - Fastest reasoning)
-- `cerebras/qwen-3-235b`
-- `cerebras/llama3.1-8b`
-</details>
-## Connect Claude Code
-### Claude Code CLI
-```bash
-ANTHROPIC_AUTH_TOKEN="freecc" ANTHROPIC_BASE_URL="http://localhost:8082" claude
-```
 ### VS Code Extension
-Open Settings, search for `claude-code.environmentVariables`, choose **Edit in settings.json**, and add:
 ```json
-"claudeCode.environmentVariables": [
-  { "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
-  { "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" }
-]
 ```
-Reload the extension. If the extension shows a login screen, choose the Anthropic Console path once; the local proxy still handles model traffic after the environment variables are active.
 ### JetBrains ACP
-Edit the installed Claude ACP config:
-- Windows: `C:\Users\%USERNAME%\AppData\Roaming\JetBrains\acp-agents\installed.json`
-- Linux/macOS: `~/.jetbrains/acp.json`
-Set the environment for `acp.registry.claude-acp`:
 ```json
-"env": {
-  "ANTHROPIC_BASE_URL": "http://localhost:8082",
-  "ANTHROPIC_AUTH_TOKEN": "freecc"
 }
 ```
-Restart the IDE after changing the file.
-### Model Picker
-Claude Code 2.1.126 or later reads this proxy's `/v1/models` endpoint when `ANTHROPIC_BASE_URL` points at the proxy. Start Claude Code normally, run `/model`, and choose any discovered provider model.
-<div align="center">
-  <img src="cc-model-picker.png" alt="Claude Code model picker showing gateway models" width="700">
-</div>
-The proxy lists models for configured provider keys and referenced local providers. Picker-safe IDs are routed back to the real provider/model automatically, so no `.env` edit or separate launcher script is needed after startup.
-Each provider model also has a `(no thinking)` picker variant. Use it when a model does not support Claude Code thinking or fails with adaptive-thinking requests. It routes to the same upstream model while asking Claude Code to send a non-thinking request.
-## Optional Integrations
-### Discord And Telegram Bots
-The bot wrapper runs Claude Code sessions remotely, streams progress, supports reply-based conversation branches, and can stop or clear tasks.
-Discord minimum config:
-```dotenv
-MESSAGING_PLATFORM="discord"
-DISCORD_BOT_TOKEN="your-discord-bot-token"
-ALLOWED_DISCORD_CHANNELS="123456789"
-CLAUDE_WORKSPACE="./agent_workspace"
-ALLOWED_DIR="C:/Users/yourname/projects"
-```
-Create the bot in the [Discord Developer Portal](https://discord.com/developers/applications), enable Message Content Intent, and invite it with read/send/history permissions.
-Telegram minimum config:
-```dotenv
-MESSAGING_PLATFORM="telegram"
-TELEGRAM_BOT_TOKEN="123456789:ABC..."
-ALLOWED_TELEGRAM_USER_ID="your-user-id"
-CLAUDE_WORKSPACE="./agent_workspace"
-ALLOWED_DIR="C:/Users/yourname/projects"
-```
-Get a token from [@BotFather](https://t.me/BotFather) and your user ID from [@userinfobot](https://t.me/userinfobot).
-Useful commands:
-- `/stop` cancels a task; reply to a task message to stop only that branch.
-- `/clear` resets sessions; reply to clear one branch.
-- `/stats` shows session state.
-### Voice Notes
-Voice notes work on Discord and Telegram. Choose one backend:
 ```bash
-uv sync --extra voice_local
-uv sync --extra voice
-uv sync --extra voice --extra voice_local
 ```
-```dotenv
-VOICE_NOTE_ENABLED=true
-WHISPER_DEVICE="cpu"          # cpu | cuda | nvidia_nim
-WHISPER_MODEL="base"
-HF_TOKEN=""
-```
-Use `WHISPER_DEVICE="nvidia_nim"` with the `voice` extra and `NVIDIA_NIM_API_KEY` for NVIDIA-hosted transcription.
-## Configuration Reference
-[`.env.example`](.env.example) is the canonical list of variables. The sections below are the ones most users change.
-### Model Routing
-```dotenv
-MODEL="nvidia_nim/z-ai/glm4.7"
-MODEL_OPUS=
-MODEL_SONNET=
-MODEL_HAIKU=
-ENABLE_MODEL_THINKING=true
-ENABLE_OPUS_THINKING=
-ENABLE_SONNET_THINKING=
-ENABLE_HAIKU_THINKING=
-```
-Blank per-tier values inherit the fallback. Blank thinking overrides inherit `ENABLE_MODEL_THINKING`.
-### Provider Keys And URLs
-```dotenv
-NVIDIA_NIM_API_KEY=""
-```
-Proxy settings are per provider:
-```dotenv
-NVIDIA_NIM_PROXY=""
 ```
-### Rate Limits And Timeouts
-```dotenv
-PROVIDER_RATE_LIMIT=1
-PROVIDER_RATE_WINDOW=3
-PROVIDER_MAX_CONCURRENCY=5
-HTTP_READ_TIMEOUT=120
-HTTP_WRITE_TIMEOUT=10
-HTTP_CONNECT_TIMEOUT=10
 ```
-Use lower limits for free hosted providers; local providers can usually tolerate higher concurrency if the machine can handle it.
-### Security And Diagnostics
-```dotenv
-ANTHROPIC_AUTH_TOKEN=
-LOG_RAW_API_PAYLOADS=false
-LOG_RAW_SSE_EVENTS=false
-LOG_API_ERROR_TRACEBACKS=false
-LOG_RAW_MESSAGING_CONTENT=false
-LOG_RAW_CLI_DIAGNOSTICS=false
-LOG_MESSAGING_ERROR_DETAILS=false
 ```
-Raw logging flags can expose prompts, tool arguments, paths, and model output. Keep them off unless you are debugging locally.
-### Local Web Tools
-```dotenv
-ENABLE_WEB_SERVER_TOOLS=true
-WEB_FETCH_ALLOWED_SCHEMES=http,https
-WEB_FETCH_ALLOW_PRIVATE_NETWORKS=false
 ```
-These tools perform outbound HTTP from the proxy. Keep private-network access disabled unless you are in a controlled lab environment.
 ## Troubleshooting
-### **Major Fixes (May 2026)**
-#### **1. Model Visibility & Caching Issues**
-The Claude CLI often caches model lists, causing local proxy models to disappear.
-- **Fix:** We implemented a "Multi-Model Advertisement" feature. The `MODEL` environment variable now supports a comma-separated list.
-- **Action:** Set `MODEL="model1,model2,model3"` in your `.env`. The proxy will force the CLI to display all of them by registering them as primary models.
-#### **2. The "Amnesia/Thinking" Loop**
-When using `auto` mode, the proxy would sometimes switch models in the middle of a "Thinking" block if it took too long, causing the CLI to repeat the same thought endlessly.
-- **Fix:** Implemented "Sticky Sessions" in `api/services.py`. Once a model yields its first event (including thinking blocks), the proxy commits to that model for the duration of the turn. Fallbacks only occur if the model fails to start entirely.
-#### **3. NVIDIA NIM Fallback Sync**
-Ensured that the `AUTO_MODEL_PRIORITY` and `NVIDIA_NIM_FALLBACK_MODELS` are synchronized to provide maximum coverage.
-### Claude Code says `undefined ... input_tokens`, `$.speed`, or malformed response
-Update to the latest commit first. Older versions could emit invalid usage metadata in streaming responses. Then check:
-- `ANTHROPIC_BASE_URL` is `http://localhost:8082`, not `http://localhost:8082/v1`.
-- The proxy is returning Server-Sent Events for `/v1/messages`.
-- `server.log` contains no upstream 400/500 response before the malformed-response error.
 ### Provider disconnects during streaming
-Errors like `incomplete chunked read`, `server disconnected`, or a peer closing the body usually come from the upstream provider or gateway. Reduce concurrency, raise timeouts, or retry later.
-### Tool calls work on one model but not another
-Tool support is model and provider dependent. Some OpenAI-compatible models emit malformed tool-call deltas, omit tool names, or return tool calls as plain text. Try another model or provider before assuming the proxy is broken.
-### The VS Code extension still shows a login screen
-Confirm the extension environment variables are set, then reload the extension or restart VS Code. The browser login flow may still appear once; the local proxy is used when `ANTHROPIC_BASE_URL` is active in the extension process.
-## How It Works
-```text
-Claude Code CLI / IDE
-        |
-        | Anthropic Messages API
-        v
-Free Claude Code proxy (:8082)
-        |
-        | provider-specific request/stream adapter
-        v
-NVIDIA NIM
-```
-Important pieces:
-- FastAPI exposes Anthropic-compatible routes such as `/v1/messages`, `/v1/messages/count_tokens`, and `/v1/models`.
-- Model routing resolves the Claude model name to `MODEL_OPUS`, `MODEL_SONNET`, `MODEL_HAIKU`, or `MODEL`.
-- NVIDIA NIM uses OpenAI chat streaming translated into Anthropic SSE.
-- The proxy normalizes thinking blocks, tool calls, token usage metadata, and provider errors into the shape Claude Code expects.
-- Request optimizations answer trivial Claude Code probes locally to save latency and quota.
-## Development
-### Project Structure
-```text
-free-claude-code/
-├── server.py              # ASGI entry point
-├── api/                   # FastAPI routes, service layer, routing, optimizations
-├── core/                  # Shared Anthropic protocol helpers and SSE utilities
-├── providers/             # Provider transports, registry, rate limiting
-├── messaging/             # Discord/Telegram adapters, sessions, voice
-├── cli/                   # Package entry points and Claude process management
-├── config/                # Settings, provider catalog, logging
-└── tests/                 # Unit and contract tests
-```
-### Commands
-```bash
-uv run ruff format
-uv run ruff check
-uv run ty check
-uv run pytest
-```
-Run them in that order before pushing. CI enforces the same checks.
-### Package Scripts
-`pyproject.toml` installs:
-- `free-claude-code`: starts the proxy with configured host and port.
-- `fcc-init`: creates the user config template at `~/.config/free-claude-code/.env`.
-### Extending
-- Add messaging platforms by implementing the `MessagingPlatform` interface in `messaging/`.
-- Extend NVIDIA NIM provider functionality by modifying `providers/nvidia_nim/`.
 ## Contributing
-- Report bugs and feature requests in [Issues](https://github.com/Alishahryar1/free-claude-code/issues).
-- Keep changes small and covered by focused tests.
-- Do not open Docker integration PRs.
-- Do not open README change PRs just open an issue for it.
-- Run the full check sequence before opening a pull request.
-- The syntax Except X, Y is brought back in python 3.14 final version (not in 3.14 alpha). Keep in mind before opening PRs.
-## NVIDIA Qwen integration
-You can run a simple NVIDIA Qwen streaming example using the OpenAI-compatible client shipped below.
-- Install the dependency:
-```bash
-pip install -r requirements.txt
-```
-- Set your NVIDIA API key (do NOT commit keys). Example (PowerShell temporary):
-```powershell
-$env:NV_API_KEY = "nvapi-<YOUR_KEY>"
-python nvidia_integration.py "Write a short Python script that prints Hello"
-```
-Persisted (Windows):
-```powershell
-setx NV_API_KEY "nvapi-<YOUR_KEY>"
-# open a new shell to use the persisted variable
-```
-Linux/macOS:
-```bash
-export NV_API_KEY="nvapi-<YOUR_KEY>"
-python nvidia_integration.py "Write a short Python script that prints Hello"
-```
-The example `nvidia_integration.py` streams completions from `https://integrate.api.nvidia.com/v1` using the `qwen/qwen3-coder-480b-a35b-instruct` model. Replace `<YOUR_KEY>` with your actual NVIDIA API key. Never share or commit your API keys.
-## License
-MIT License. See [LICENSE](LICENSE) for details.

 # 🤖 Free Claude Code
+**Use Claude Code with free NVIDIA NIM models through a lightweight proxy.**
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
 [![Python 3.14](https://img.shields.io/badge/python-3.14-3776ab.svg?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/downloads/)
+[![uv](https://img.shields.io/badge/uv-spawn-ffc21c.svg?style=for-the-badge)](https://github.com/astral-sh/uv)
 [![Code style: Ruff](https://img.shields.io/badge/code%20formatting-ruff-f5a623.svg?style=for-the-badge)](https://github.com/astral-sh/ruff)
+</div>
+## The Problem
+Claude Code costs $100+/month for API access. This project lets you run it using **free NVIDIA NIM models** instead.
+## The Solution
+A FastAPI proxy that translates Claude Code's Anthropic API calls to NVIDIA NIM's OpenAI-compatible endpoint. Zero code changes needed in Claude Code.
+```
+┌─────────────────┐      Anthropic API       ┌──────────────────┐
+│   Claude Code   │ ──────────────────────▶  │  Free Claude    │
+│   (Official)    │                          │     Code        │
+│                 │ ◀──────────────────────── │    Proxy        │
+└─────────────────┘    SSE Streaming         │   (:8082)       │
+                                          └────────┬─────────┘
+                                                   │
+                                          OpenAI Chat API
+                                                   │
+                                                   ▼
+                                          ┌──────────────────┐
+                                          │   NVIDIA NIM     │
+                                          │  (Free Models)   │
+                                          └──────────────────┘
+```
+## Features
+- **Drop-in replacement** for Claude Code's Anthropic API
+- **7 free NVIDIA NIM models** available via auto-routing
+- **Automatic failover** - switches to next model if one hits rate limit
+- **Multi-model support** - use different models for different tasks
+- **Local optimizations** - fast-path for common probes (saves API calls)
+- **Streaming** - real-time response with SSE
+- **Tool support** - Claude Code tools work with NIM models
+- **Thinking blocks** - reasoning support where models support it
+- **Discord/Telegram bots** - remote Claude Code sessions
+- **Voice notes** - transcribe voice messages with Whisper
+## Quick Start (Cloud - No Setup)
+The easiest way to use this project is on **HuggingFace Spaces** (free tier available).
+### 1. Deploy to HuggingFace Spaces
+<a target="_blank" href="https://huggingface.co/new-space?template=Yash030/claude-code-proxy">
+  <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-lg.svg" alt="Deploy to HuggingFace Spaces"/>
+</a>
+Or manually:
+1. Go to [huggingface.co/spaces/Yash030/claude-code-proxy](https://huggingface.co/spaces/Yash030/claude-code-proxy)
+2. Duplicate the space
+3. Set your secrets in the Space settings:
+   - `NVIDIA_NIM_API_KEY` - Your NVIDIA API key
+   - `ANTHROPIC_AUTH_TOKEN` - Your auth token (any secret)
+### 2. Get NVIDIA API Key
+Get a free key at [build.nvidia.com/settings/api-keys](https://build.nvidia.com/settings/api-keys).
+### 3. Connect Claude Code
 ```bash
+# Use your HuggingFace Space URL (ends with .hf.space)
+export ANTHROPIC_AUTH_TOKEN="your-secret-token"
+export ANTHROPIC_BASE_URL="https://your-space-name.hf.space"
+claude
 ```
+That's it! Claude Code will use free NVIDIA NIM models.
+## Quick Start (Local)
+### 1. Install Requirements
 ```bash
+# Install Claude Code
+curl -LsSf https://download.anthropic.com/install.sh | sh
+# Install uv (fast Python package manager)
+curl -LsSf https://astral.sh/uv/install.sh | sh
+uv python install 3.14
 ```
+### 2. Clone and Configure
+```bash
+git clone https://github.com/Yashwant00CR7/claude-code-nvidia.git
+cd claude-code-nvidia
+cp .env.example .env
 ```
+Edit `.env`:
 ```dotenv
 NVIDIA_NIM_API_KEY="nvapi-your-key"
 ANTHROPIC_AUTH_TOKEN="freecc"
+MODEL="nvidia_nim/z-ai/glm4.7"
 ```
+### 3. Start Proxy
 ```bash
+uv sync
 uv run uvicorn server:app --host 0.0.0.0 --port 8082
 ```
+### 4. Run Claude Code
 ```bash
+export ANTHROPIC_AUTH_TOKEN="freecc"
+export ANTHROPIC_BASE_URL="http://localhost:8082"
+claude
 ```
+## Available Models
+The proxy automatically routes to these models in order:
+| Model | Best For | Speed |
+|-------|----------|-------|
+| `qwen3-coder-480b` | Code generation | Fast |
+| `glm4.7` | General purpose | Fast |
+| `step-3.5-flash` | Fast responses | Very Fast |
+| `mistral-large-3` | Reasoning | Medium |
+| `dracarys-llama-3.1-70b` | Complex tasks | Medium |
+| `seed-oss-36b` | Balanced | Fast |
+| `mistral-nemotron` | Thinking tasks | Medium |
+## How Auto-Routing Works
+When you use `auto` model, the proxy:
+1. **Tries models in order** of speed/reliability
+2. **Skips rate-limited models** - pre-flight check before each request
+3. **Fast failover** - if one model times out, immediately tries next
+4. **No API waste** - common probes handled locally
 ```
+Request: "Write a function"
+    ↓
+Check if model 1 is rate-limited? → Yes → Skip
+Check if model 2 is rate-limited? → No → Try
+    ↓
+Model 2 responds? → Yes → Stream response
+Model 2 timeout? → Try model 3 → Success!
 ```
+## Environment Variables
+### Required
 ```dotenv
+NVIDIA_NIM_API_KEY="nvapi-your-key"     # From build.nvidia.com
+ANTHROPIC_AUTH_TOKEN="your-secret"     # Any secret you choose
 ```
+### Optional
 ```dotenv
+MODEL="nvidia_nim/z-ai/glm4.7"          # Default model
+MODEL_OPUS="nvidia_nim/qwen/qwen3-..."  # Model for Opus requests
+MODEL_SONNET="nvidia_nim/z-ai/glm4.7"    # Model for Sonnet requests
+MODEL_HAIKU="nvidia_nim/z-ai/glm4.7"    # Model for Haiku requests
+# Auto-routing order (comma-separated)
+AUTO_MODEL_PRIORITY="nvidia_nim/qwen/...,nvidia_nim/z-ai/..."
+# Thinking support
+ENABLE_MODEL_THINKING=true              # Enable reasoning blocks
 ```
+## IDE Integration
 ### VS Code Extension
+Add to `.vscode/settings.json`:
 ```json
+{
+  "claudeCode.environmentVariables": [
+    { "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
+    { "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" }
+  ]
+}
 ```
 ### JetBrains ACP
+Edit `~/.jetbrains/acp.json`:
 ```json
+{
+  "env": {
+    "ANTHROPIC_BASE_URL": "http://localhost:8082",
+    "ANTHROPIC_AUTH_TOKEN": "freecc"
+  }
 }
 ```
+### Remote/Ssh
+For remote development, deploy to HuggingFace Spaces and use:
 ```bash
+export ANTHROPIC_BASE_URL="https://your-space.hf.space"
 ```
+## Deployment Options
+### HuggingFace Spaces (Recommended for Cloud)
+**Free tier includes:**
+- 2 vCPU
+- Community support
+- Automatic HTTPS
+- Git-based deployment
+**Setup:**
+1. Fork [the space](https://huggingface.co/spaces/Yash030/claude-code-proxy)
+2. Add `NVIDIA_NIM_API_KEY` to Space secrets
+3. Access at `https://your-space.hf.space`
+### Railway (Easy Deploy)
+1. Connect GitHub repo
+2. Set environment variables
+3. Deploy with auto-scaling
+### Render (Free Tier)
+1. Create Web Service
+2. Connect GitHub
+3. Set build command: `uv sync`
+4. Set start command: `uv run uvicorn server:app --host 0.0.0.0 --port $PORT`
+### Fly.io (Global Edge)
+```bash
+fly launch
+fly secrets set NVIDIA_NIM_API_KEY="nvapi-..."
+fly deploy
 ```
+### Local/Docker
+```bash
+docker build -t free-claude-code .
+docker run -p 8082:8082 \
+  -e NVIDIA_NIM_API_KEY="nvapi-..." \
+  -e ANTHROPIC_AUTH_TOKEN="freecc" \
+  free-claude-code
 ```
+## Architecture
 ```
+api/
+├── routes.py          # FastAPI endpoints
+├── services.py       # Request handling & failover
+├── model_router.py   # Model resolution
+├── detection.py      # Request type detection
+└── optimization_handlers.py  # Fast-path responses
+core/
+├── anthropic/        # SSE, token counting, tool parsing
+└── task_detector.py # Task capability detection
+providers/
+├── openai_compat.py  # Base OpenAI transport
+├── nvidia_nim/       # NVIDIA NIM provider
+└── rate_limit.py     # Rate limiting
+messaging/
+├── discord.py        # Discord bot wrapper
+└── telegram.py       # Telegram bot wrapper
 ```
 ## Troubleshooting
+### "undefined ... input_tokens" error
+- Update to latest version: `git pull`
+- Check `ANTHROPIC_BASE_URL` doesn't end with `/v1`
 ### Provider disconnects during streaming
+- Reduce `PROVIDER_MAX_CONCURRENCY`
+- Increase `HTTP_READ_TIMEOUT`
+- Check NVIDIA NIM status at [status.nvidia.com](https://status.nvidia.com)
+### Model not responding
+- Check your NVIDIA API key is valid
+- Verify rate limits haven't been hit
+- Try a different model
+### VS Code extension shows login
+- Reload the extension after setting env vars
+- Confirm environment variables are set correctly
 ## Contributing
+1. Fork the repo
+2. Create a feature branch
+3. Run checks: `uv run ruff format && uv run ruff check && uv run ty check`
+4. Submit PR
+## License
+MIT License - See [LICENSE](LICENSE)
+## Links
+- [GitHub](https://github.com/Yashwant00CR7/claude-code-nvidia)
+- [HuggingFace Space](https://huggingface.co/spaces/Yash030/claude-code-proxy)
+- [NVIDIA NIM](https://build.nvidia.com)
+- [Claude Code](https://github.com/anthropics/claude-code)