Yash030 Claude Opus 4.7 commited on
Commit
fcc5278
Β·
1 Parent(s): ebba9d6

docs: complete README refactor with cloud deploy guide

Browse files

- Restructured to lead with the problem/solution approach
- Added HuggingFace Spaces as primary deployment method
- Documented all deployment options (HF, Railway, Render, Fly.io, Docker)
- Added architecture overview diagram
- Simplified troubleshooting section
- Updated model list with speed ratings
- Added visual ASCII diagram for auto-routing flow

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (1) hide show
  1. README.md +205 -393
README.md CHANGED
@@ -14,508 +14,320 @@ pinned: false
14
 
15
  # πŸ€– Free Claude Code
16
 
17
- Use Claude Code CLI, VS Code, JetBrains ACP, or chat bots through your own Anthropic-compatible proxy.
18
 
19
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
20
  [![Python 3.14](https://img.shields.io/badge/python-3.14-3776ab.svg?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/downloads/)
21
- [![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json&style=for-the-badge)](https://github.com/astral-sh/uv)
22
- [![Tested with Pytest](https://img.shields.io/badge/testing-Pytest-00c0ff.svg?style=for-the-badge)](https://github.com/Alishahryar1/free-claude-code/actions/workflows/tests.yml)
23
- [![Type checking: Ty](https://img.shields.io/badge/type%20checking-ty-ffcc00.svg?style=for-the-badge)](https://pypi.org/project/ty/)
24
  [![Code style: Ruff](https://img.shields.io/badge/code%20formatting-ruff-f5a623.svg?style=for-the-badge)](https://github.com/astral-sh/ruff)
25
- [![Logging: Loguru](https://img.shields.io/badge/logging-loguru-4ecdc4.svg?style=for-the-badge)](https://github.com/Delgan/loguru)
26
 
27
- Free Claude Code routes Anthropic Messages API traffic from Claude Code to NVIDIA NIM. It keeps Claude Code's client-side protocol stable while letting you use NVIDIA's free models.
28
 
29
- ## Git Origins
30
 
31
- This project is synchronized between two repositories:
32
 
33
- | Platform | URL |
34
- |----------|-----|
35
- | **Hugging Face Spaces** | [huggingface.co/spaces/Yash030/claude-code-proxy](https://huggingface.co/spaces/Yash030/claude-code-proxy) |
36
- | **GitHub** | [github.com/Yashwant00CR7/claude-code-nvidia](https://github.com/Yashwant00CR7/claude-code-nvidia) |
37
 
38
- [Quick Start](#quick-start) Β· [Providers](#choose-a-provider) Β· [Clients](#connect-claude-code) Β· [Troubleshooting](#troubleshooting) Β· [Development](#development)
39
 
40
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
- <div align="center">
43
- <img src="pic.png" alt="Free Claude Code in action" width="700">
44
- </div>
45
 
46
- ## What You Get
 
 
 
 
 
 
 
 
 
47
 
48
- - Drop-in proxy for Claude Code's Anthropic API calls.
49
- - NVIDIA NIM provider backend with free models.
50
- - Per-model routing: send Opus, Sonnet, Haiku, and fallback traffic to different NVIDIA NIM models.
51
- - Native Claude Code `/model` picker support through the proxy's `/v1/models` endpoint.
52
- - Streaming, tool use, reasoning/thinking block handling, and local request optimizations.
53
- - Optional Discord or Telegram bot wrapper for remote coding sessions.
54
- - Optional voice-note transcription through local Whisper or NVIDIA NIM.
55
 
56
- ## Quick Start
57
 
58
- ### 1. Install Requirements
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
- Install [Claude Code](https://github.com/anthropics/claude-code), then install `uv` and Python 3.14.
61
 
62
- macOS/Linux:
63
 
64
  ```bash
65
- curl -LsSf https://astral.sh/uv/install.sh | sh
66
- uv self update
67
- uv python install 3.14
 
68
  ```
69
 
70
- Windows PowerShell:
71
 
72
- ```powershell
73
- powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
74
- uv self update
75
- uv python install 3.14
76
- ```
77
 
78
- ### 2. Clone And Configure
79
 
80
  ```bash
81
- git clone https://github.com/Alishahryar1/free-claude-code.git
82
- cd free-claude-code
83
- cp .env.example .env
 
 
 
84
  ```
85
 
86
- PowerShell uses:
87
 
88
- ```powershell
89
- Copy-Item .env.example .env
 
 
90
  ```
91
 
92
- Edit `.env` and choose one provider. For the default NVIDIA NIM path:
93
-
94
  ```dotenv
95
  NVIDIA_NIM_API_KEY="nvapi-your-key"
96
- MODEL="nvidia_nim/z-ai/glm4.7"
97
  ANTHROPIC_AUTH_TOKEN="freecc"
 
98
  ```
99
 
100
- Use any local secret for `ANTHROPIC_AUTH_TOKEN`; Claude Code will send the same value back to this proxy. Leave it empty only for local/private testing.
101
-
102
- ### 3. Start The Proxy
103
 
104
  ```bash
 
105
  uv run uvicorn server:app --host 0.0.0.0 --port 8082
106
  ```
107
 
108
- Package install alternative:
109
 
110
  ```bash
111
- uv tool install git+https://github.com/Alishahryar1/free-claude-code.git
112
- fcc-init
113
- free-claude-code
114
  ```
115
 
116
- `fcc-init` creates `~/.config/free-claude-code/.env` from the bundled template.
117
 
118
- ### 4. Run Claude Code
119
 
120
- Point `ANTHROPIC_BASE_URL` at the proxy root. Do not append `/v1`.
 
 
 
 
 
 
 
 
121
 
122
- PowerShell:
123
 
124
- ```powershell
125
- $env:ANTHROPIC_AUTH_TOKEN="freecc"; $env:ANTHROPIC_BASE_URL="http://localhost:8082"; claude
126
- ```
127
 
128
- Bash:
 
 
 
129
 
130
- ```bash
131
- ANTHROPIC_AUTH_TOKEN="freecc" ANTHROPIC_BASE_URL="http://localhost:8082" claude
132
  ```
133
-
134
- ## Choose A Provider
135
-
136
- Model values use this format:
137
-
138
- ```text
139
- provider_id/model/name
140
  ```
141
 
142
- `MODEL` is the fallback. `MODEL_OPUS`, `MODEL_SONNET`, and `MODEL_HAIKU` override routing for requests that Claude Code sends for those tiers.
143
-
144
- | Provider | Prefix | Transport | Key | Default base URL |
145
- | --- | --- | --- | --- | --- |
146
- | <img src="https://cdn.simpleicons.org/nvidia/76B900" alt="" width="18" height="18"> NVIDIA NIM | `nvidia_nim/...` | OpenAI chat translation | `NVIDIA_NIM_API_KEY` | `https://integrate.api.nvidia.com/v1` |
147
- | <img src="https://cdn.simpleicons.org/groq/F55036" alt="" width="18" height="18"> Groq | `groq/...` | OpenAI chat translation | `GROQ_API_KEY` | `https://api.groq.com/openai/v1` |
148
- | <img src="https://cdn.simpleicons.org/cerebras/313131" alt="" width="18" height="18"> Cerebras | `cerebras/...` | OpenAI chat translation | `CEREBRAS_API_KEY` | `https://api.cerebras.ai/v1` |
149
-
150
- <details>
151
- <summary><img src="https://cdn.simpleicons.org/nvidia/76B900" alt="" width="18" height="18"> <b>NVIDIA NIM</b></summary>
152
-
153
- Get a key at [build.nvidia.com/settings/api-keys](https://build.nvidia.com/settings/api-keys).
154
 
 
155
  ```dotenv
156
- NVIDIA_NIM_API_KEY="nvapi-your-key"
157
- MODEL="nvidia_nim/z-ai/glm4.7"
158
  ```
159
 
160
- Popular examples:
161
-
162
- - `nvidia_nim/qwen/qwen3-coder-480b-a35b-instruct`
163
- - `nvidia_nim/mistralai/mistral-large-3-675b-instruct-2512`
164
- - `nvidia_nim/z-ai/glm4.7`
165
-
166
- </details>
167
-
168
- <details>
169
- <summary><img src="https://cdn.simpleicons.org/groq/F55036" alt="" width="18" height="18"> <b>Groq</b></summary>
170
-
171
- Get a key at [console.groq.com/keys](https://console.groq.com/keys).
172
-
173
  ```dotenv
174
- GROQ_API_KEY="gsk_..."
175
- MODEL="groq/openai/gpt-oss-120b"
176
- ```
177
-
178
- Popular examples:
179
-
180
- - `groq/openai/gpt-oss-120b` (Best overall for Claude Code)
181
- - `groq/openai/gpt-oss-20b` (Ultra-low latency)
182
- - `groq/llama-3.3-70b-versatile`
183
-
184
- </details>
185
-
186
- <details>
187
- <summary><img src="https://cdn.simpleicons.org/cerebras/313131" alt="" width="18" height="18"> <b>Cerebras</b></summary>
188
 
189
- Get a key at [cloud.cerebras.ai](https://cloud.cerebras.ai/).
 
190
 
191
- ```dotenv
192
- CEREBRAS_API_KEY="csk_..."
193
- MODEL="cerebras/gpt-oss-120b"
194
  ```
195
 
196
- Popular examples:
197
-
198
- - `cerebras/gpt-oss-120b` (~3000 tok/s - Fastest reasoning)
199
- - `cerebras/qwen-3-235b`
200
- - `cerebras/llama3.1-8b`
201
-
202
- </details>
203
-
204
- ## Connect Claude Code
205
-
206
- ### Claude Code CLI
207
-
208
- ```bash
209
- ANTHROPIC_AUTH_TOKEN="freecc" ANTHROPIC_BASE_URL="http://localhost:8082" claude
210
- ```
211
 
212
  ### VS Code Extension
213
 
214
- Open Settings, search for `claude-code.environmentVariables`, choose **Edit in settings.json**, and add:
215
-
216
  ```json
217
- "claudeCode.environmentVariables": [
218
- { "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
219
- { "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" }
220
- ]
 
 
221
  ```
222
 
223
- Reload the extension. If the extension shows a login screen, choose the Anthropic Console path once; the local proxy still handles model traffic after the environment variables are active.
224
-
225
  ### JetBrains ACP
226
 
227
- Edit the installed Claude ACP config:
228
-
229
- - Windows: `C:\Users\%USERNAME%\AppData\Roaming\JetBrains\acp-agents\installed.json`
230
- - Linux/macOS: `~/.jetbrains/acp.json`
231
-
232
- Set the environment for `acp.registry.claude-acp`:
233
-
234
  ```json
235
- "env": {
236
- "ANTHROPIC_BASE_URL": "http://localhost:8082",
237
- "ANTHROPIC_AUTH_TOKEN": "freecc"
 
 
238
  }
239
  ```
240
 
241
- Restart the IDE after changing the file.
242
-
243
- ### Model Picker
244
-
245
- Claude Code 2.1.126 or later reads this proxy's `/v1/models` endpoint when `ANTHROPIC_BASE_URL` points at the proxy. Start Claude Code normally, run `/model`, and choose any discovered provider model.
246
-
247
- <div align="center">
248
- <img src="cc-model-picker.png" alt="Claude Code model picker showing gateway models" width="700">
249
- </div>
250
-
251
- The proxy lists models for configured provider keys and referenced local providers. Picker-safe IDs are routed back to the real provider/model automatically, so no `.env` edit or separate launcher script is needed after startup.
252
-
253
- Each provider model also has a `(no thinking)` picker variant. Use it when a model does not support Claude Code thinking or fails with adaptive-thinking requests. It routes to the same upstream model while asking Claude Code to send a non-thinking request.
254
-
255
- ## Optional Integrations
256
-
257
- ### Discord And Telegram Bots
258
-
259
- The bot wrapper runs Claude Code sessions remotely, streams progress, supports reply-based conversation branches, and can stop or clear tasks.
260
-
261
- Discord minimum config:
262
-
263
- ```dotenv
264
- MESSAGING_PLATFORM="discord"
265
- DISCORD_BOT_TOKEN="your-discord-bot-token"
266
- ALLOWED_DISCORD_CHANNELS="123456789"
267
- CLAUDE_WORKSPACE="./agent_workspace"
268
- ALLOWED_DIR="C:/Users/yourname/projects"
269
- ```
270
-
271
- Create the bot in the [Discord Developer Portal](https://discord.com/developers/applications), enable Message Content Intent, and invite it with read/send/history permissions.
272
-
273
- Telegram minimum config:
274
-
275
- ```dotenv
276
- MESSAGING_PLATFORM="telegram"
277
- TELEGRAM_BOT_TOKEN="123456789:ABC..."
278
- ALLOWED_TELEGRAM_USER_ID="your-user-id"
279
- CLAUDE_WORKSPACE="./agent_workspace"
280
- ALLOWED_DIR="C:/Users/yourname/projects"
281
- ```
282
-
283
- Get a token from [@BotFather](https://t.me/BotFather) and your user ID from [@userinfobot](https://t.me/userinfobot).
284
-
285
- Useful commands:
286
-
287
- - `/stop` cancels a task; reply to a task message to stop only that branch.
288
- - `/clear` resets sessions; reply to clear one branch.
289
- - `/stats` shows session state.
290
-
291
- ### Voice Notes
292
-
293
- Voice notes work on Discord and Telegram. Choose one backend:
294
 
 
295
  ```bash
296
- uv sync --extra voice_local
297
- uv sync --extra voice
298
- uv sync --extra voice --extra voice_local
299
  ```
300
 
301
- ```dotenv
302
- VOICE_NOTE_ENABLED=true
303
- WHISPER_DEVICE="cpu" # cpu | cuda | nvidia_nim
304
- WHISPER_MODEL="base"
305
- HF_TOKEN=""
306
- ```
307
 
308
- Use `WHISPER_DEVICE="nvidia_nim"` with the `voice` extra and `NVIDIA_NIM_API_KEY` for NVIDIA-hosted transcription.
309
 
310
- ## Configuration Reference
 
 
 
 
311
 
312
- [`.env.example`](.env.example) is the canonical list of variables. The sections below are the ones most users change.
 
 
 
313
 
314
- ### Model Routing
315
 
316
- ```dotenv
317
- MODEL="nvidia_nim/z-ai/glm4.7"
318
- MODEL_OPUS=
319
- MODEL_SONNET=
320
- MODEL_HAIKU=
321
- ENABLE_MODEL_THINKING=true
322
- ENABLE_OPUS_THINKING=
323
- ENABLE_SONNET_THINKING=
324
- ENABLE_HAIKU_THINKING=
325
- ```
326
-
327
- Blank per-tier values inherit the fallback. Blank thinking overrides inherit `ENABLE_MODEL_THINKING`.
328
 
329
- ### Provider Keys And URLs
330
 
331
- ```dotenv
332
- NVIDIA_NIM_API_KEY=""
333
- ```
 
334
 
335
- Proxy settings are per provider:
336
 
337
- ```dotenv
338
- NVIDIA_NIM_PROXY=""
 
 
339
  ```
340
 
341
- ### Rate Limits And Timeouts
342
 
343
- ```dotenv
344
- PROVIDER_RATE_LIMIT=1
345
- PROVIDER_RATE_WINDOW=3
346
- PROVIDER_MAX_CONCURRENCY=5
347
- HTTP_READ_TIMEOUT=120
348
- HTTP_WRITE_TIMEOUT=10
349
- HTTP_CONNECT_TIMEOUT=10
350
  ```
351
 
352
- Use lower limits for free hosted providers; local providers can usually tolerate higher concurrency if the machine can handle it.
353
-
354
- ### Security And Diagnostics
355
 
356
- ```dotenv
357
- ANTHROPIC_AUTH_TOKEN=
358
- LOG_RAW_API_PAYLOADS=false
359
- LOG_RAW_SSE_EVENTS=false
360
- LOG_API_ERROR_TRACEBACKS=false
361
- LOG_RAW_MESSAGING_CONTENT=false
362
- LOG_RAW_CLI_DIAGNOSTICS=false
363
- LOG_MESSAGING_ERROR_DETAILS=false
364
  ```
 
 
 
 
 
 
365
 
366
- Raw logging flags can expose prompts, tool arguments, paths, and model output. Keep them off unless you are debugging locally.
 
 
367
 
368
- ### Local Web Tools
 
 
 
369
 
370
- ```dotenv
371
- ENABLE_WEB_SERVER_TOOLS=true
372
- WEB_FETCH_ALLOWED_SCHEMES=http,https
373
- WEB_FETCH_ALLOW_PRIVATE_NETWORKS=false
374
  ```
375
 
376
- These tools perform outbound HTTP from the proxy. Keep private-network access disabled unless you are in a controlled lab environment.
377
-
378
  ## Troubleshooting
379
 
380
- ### **Major Fixes (May 2026)**
381
-
382
- #### **1. Model Visibility & Caching Issues**
383
- The Claude CLI often caches model lists, causing local proxy models to disappear.
384
- - **Fix:** We implemented a "Multi-Model Advertisement" feature. The `MODEL` environment variable now supports a comma-separated list.
385
- - **Action:** Set `MODEL="model1,model2,model3"` in your `.env`. The proxy will force the CLI to display all of them by registering them as primary models.
386
-
387
- #### **2. The "Amnesia/Thinking" Loop**
388
- When using `auto` mode, the proxy would sometimes switch models in the middle of a "Thinking" block if it took too long, causing the CLI to repeat the same thought endlessly.
389
- - **Fix:** Implemented "Sticky Sessions" in `api/services.py`. Once a model yields its first event (including thinking blocks), the proxy commits to that model for the duration of the turn. Fallbacks only occur if the model fails to start entirely.
390
-
391
- #### **3. NVIDIA NIM Fallback Sync**
392
- Ensured that the `AUTO_MODEL_PRIORITY` and `NVIDIA_NIM_FALLBACK_MODELS` are synchronized to provide maximum coverage.
393
-
394
- ### Claude Code says `undefined ... input_tokens`, `$.speed`, or malformed response
395
-
396
-
397
- Update to the latest commit first. Older versions could emit invalid usage metadata in streaming responses. Then check:
398
-
399
- - `ANTHROPIC_BASE_URL` is `http://localhost:8082`, not `http://localhost:8082/v1`.
400
- - The proxy is returning Server-Sent Events for `/v1/messages`.
401
- - `server.log` contains no upstream 400/500 response before the malformed-response error.
402
-
403
 
404
  ### Provider disconnects during streaming
 
 
 
405
 
406
- Errors like `incomplete chunked read`, `server disconnected`, or a peer closing the body usually come from the upstream provider or gateway. Reduce concurrency, raise timeouts, or retry later.
407
-
408
- ### Tool calls work on one model but not another
409
-
410
- Tool support is model and provider dependent. Some OpenAI-compatible models emit malformed tool-call deltas, omit tool names, or return tool calls as plain text. Try another model or provider before assuming the proxy is broken.
411
 
412
- ### The VS Code extension still shows a login screen
413
-
414
- Confirm the extension environment variables are set, then reload the extension or restart VS Code. The browser login flow may still appear once; the local proxy is used when `ANTHROPIC_BASE_URL` is active in the extension process.
415
-
416
- ## How It Works
417
-
418
- ```text
419
- Claude Code CLI / IDE
420
- |
421
- | Anthropic Messages API
422
- v
423
- Free Claude Code proxy (:8082)
424
- |
425
- | provider-specific request/stream adapter
426
- v
427
- NVIDIA NIM
428
- ```
429
-
430
- Important pieces:
431
-
432
- - FastAPI exposes Anthropic-compatible routes such as `/v1/messages`, `/v1/messages/count_tokens`, and `/v1/models`.
433
- - Model routing resolves the Claude model name to `MODEL_OPUS`, `MODEL_SONNET`, `MODEL_HAIKU`, or `MODEL`.
434
- - NVIDIA NIM uses OpenAI chat streaming translated into Anthropic SSE.
435
- - The proxy normalizes thinking blocks, tool calls, token usage metadata, and provider errors into the shape Claude Code expects.
436
- - Request optimizations answer trivial Claude Code probes locally to save latency and quota.
437
-
438
- ## Development
439
-
440
- ### Project Structure
441
-
442
- ```text
443
- free-claude-code/
444
- β”œβ”€β”€ server.py # ASGI entry point
445
- β”œβ”€β”€ api/ # FastAPI routes, service layer, routing, optimizations
446
- β”œβ”€β”€ core/ # Shared Anthropic protocol helpers and SSE utilities
447
- β”œβ”€β”€ providers/ # Provider transports, registry, rate limiting
448
- β”œβ”€β”€ messaging/ # Discord/Telegram adapters, sessions, voice
449
- β”œβ”€β”€ cli/ # Package entry points and Claude process management
450
- β”œβ”€β”€ config/ # Settings, provider catalog, logging
451
- └── tests/ # Unit and contract tests
452
- ```
453
-
454
- ### Commands
455
-
456
- ```bash
457
- uv run ruff format
458
- uv run ruff check
459
- uv run ty check
460
- uv run pytest
461
- ```
462
-
463
- Run them in that order before pushing. CI enforces the same checks.
464
-
465
- ### Package Scripts
466
-
467
- `pyproject.toml` installs:
468
-
469
- - `free-claude-code`: starts the proxy with configured host and port.
470
- - `fcc-init`: creates the user config template at `~/.config/free-claude-code/.env`.
471
-
472
- ### Extending
473
-
474
- - Add messaging platforms by implementing the `MessagingPlatform` interface in `messaging/`.
475
- - Extend NVIDIA NIM provider functionality by modifying `providers/nvidia_nim/`.
476
 
477
  ## Contributing
478
 
479
- - Report bugs and feature requests in [Issues](https://github.com/Alishahryar1/free-claude-code/issues).
480
- - Keep changes small and covered by focused tests.
481
- - Do not open Docker integration PRs.
482
- - Do not open README change PRs just open an issue for it.
483
- - Run the full check sequence before opening a pull request.
484
- - The syntax Except X, Y is brought back in python 3.14 final version (not in 3.14 alpha). Keep in mind before opening PRs.
485
-
486
- ## NVIDIA Qwen integration
487
-
488
- You can run a simple NVIDIA Qwen streaming example using the OpenAI-compatible client shipped below.
489
-
490
- - Install the dependency:
491
-
492
- ```bash
493
- pip install -r requirements.txt
494
- ```
495
-
496
- - Set your NVIDIA API key (do NOT commit keys). Example (PowerShell temporary):
497
-
498
- ```powershell
499
- $env:NV_API_KEY = "nvapi-<YOUR_KEY>"
500
- python nvidia_integration.py "Write a short Python script that prints Hello"
501
- ```
502
-
503
- Persisted (Windows):
504
-
505
- ```powershell
506
- setx NV_API_KEY "nvapi-<YOUR_KEY>"
507
- # open a new shell to use the persisted variable
508
- ```
509
-
510
- Linux/macOS:
511
 
512
- ```bash
513
- export NV_API_KEY="nvapi-<YOUR_KEY>"
514
- python nvidia_integration.py "Write a short Python script that prints Hello"
515
- ```
516
 
517
- The example `nvidia_integration.py` streams completions from `https://integrate.api.nvidia.com/v1` using the `qwen/qwen3-coder-480b-a35b-instruct` model. Replace `<YOUR_KEY>` with your actual NVIDIA API key. Never share or commit your API keys.
518
 
519
- ## License
520
 
521
- MIT License. See [LICENSE](LICENSE) for details.
 
 
 
 
14
 
15
  # πŸ€– Free Claude Code
16
 
17
+ **Use Claude Code with free NVIDIA NIM models through a lightweight proxy.**
18
 
19
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
20
  [![Python 3.14](https://img.shields.io/badge/python-3.14-3776ab.svg?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/downloads/)
21
+ [![uv](https://img.shields.io/badge/uv-spawn-ffc21c.svg?style=for-the-badge)](https://github.com/astral-sh/uv)
 
 
22
  [![Code style: Ruff](https://img.shields.io/badge/code%20formatting-ruff-f5a623.svg?style=for-the-badge)](https://github.com/astral-sh/ruff)
 
23
 
24
+ </div>
25
 
26
+ ## The Problem
27
 
28
+ Claude Code costs $100+/month for API access. This project lets you run it using **free NVIDIA NIM models** instead.
29
 
30
+ ## The Solution
 
 
 
31
 
32
+ A FastAPI proxy that translates Claude Code's Anthropic API calls to NVIDIA NIM's OpenAI-compatible endpoint. Zero code changes needed in Claude Code.
33
 
34
+ ```
35
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” Anthropic API β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
36
+ β”‚ Claude Code β”‚ ──────────────────────▢ β”‚ Free Claude β”‚
37
+ β”‚ (Official) β”‚ β”‚ Code β”‚
38
+ β”‚ β”‚ ◀──────────────────────── β”‚ Proxy β”‚
39
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ SSE Streaming β”‚ (:8082) β”‚
40
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
41
+ β”‚
42
+ OpenAI Chat API
43
+ β”‚
44
+ β–Ό
45
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
46
+ β”‚ NVIDIA NIM β”‚
47
+ β”‚ (Free Models) β”‚
48
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
49
+ ```
50
 
51
+ ## Features
 
 
52
 
53
+ - **Drop-in replacement** for Claude Code's Anthropic API
54
+ - **7 free NVIDIA NIM models** available via auto-routing
55
+ - **Automatic failover** - switches to next model if one hits rate limit
56
+ - **Multi-model support** - use different models for different tasks
57
+ - **Local optimizations** - fast-path for common probes (saves API calls)
58
+ - **Streaming** - real-time response with SSE
59
+ - **Tool support** - Claude Code tools work with NIM models
60
+ - **Thinking blocks** - reasoning support where models support it
61
+ - **Discord/Telegram bots** - remote Claude Code sessions
62
+ - **Voice notes** - transcribe voice messages with Whisper
63
 
64
+ ## Quick Start (Cloud - No Setup)
 
 
 
 
 
 
65
 
66
+ The easiest way to use this project is on **HuggingFace Spaces** (free tier available).
67
 
68
+ ### 1. Deploy to HuggingFace Spaces
69
+
70
+ <a target="_blank" href="https://huggingface.co/new-space?template=Yash030/claude-code-proxy">
71
+ <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-lg.svg" alt="Deploy to HuggingFace Spaces"/>
72
+ </a>
73
+
74
+ Or manually:
75
+ 1. Go to [huggingface.co/spaces/Yash030/claude-code-proxy](https://huggingface.co/spaces/Yash030/claude-code-proxy)
76
+ 2. Duplicate the space
77
+ 3. Set your secrets in the Space settings:
78
+ - `NVIDIA_NIM_API_KEY` - Your NVIDIA API key
79
+ - `ANTHROPIC_AUTH_TOKEN` - Your auth token (any secret)
80
+
81
+ ### 2. Get NVIDIA API Key
82
 
83
+ Get a free key at [build.nvidia.com/settings/api-keys](https://build.nvidia.com/settings/api-keys).
84
 
85
+ ### 3. Connect Claude Code
86
 
87
  ```bash
88
+ # Use your HuggingFace Space URL (ends with .hf.space)
89
+ export ANTHROPIC_AUTH_TOKEN="your-secret-token"
90
+ export ANTHROPIC_BASE_URL="https://your-space-name.hf.space"
91
+ claude
92
  ```
93
 
94
+ That's it! Claude Code will use free NVIDIA NIM models.
95
 
96
+ ## Quick Start (Local)
 
 
 
 
97
 
98
+ ### 1. Install Requirements
99
 
100
  ```bash
101
+ # Install Claude Code
102
+ curl -LsSf https://download.anthropic.com/install.sh | sh
103
+
104
+ # Install uv (fast Python package manager)
105
+ curl -LsSf https://astral.sh/uv/install.sh | sh
106
+ uv python install 3.14
107
  ```
108
 
109
+ ### 2. Clone and Configure
110
 
111
+ ```bash
112
+ git clone https://github.com/Yashwant00CR7/claude-code-nvidia.git
113
+ cd claude-code-nvidia
114
+ cp .env.example .env
115
  ```
116
 
117
+ Edit `.env`:
 
118
  ```dotenv
119
  NVIDIA_NIM_API_KEY="nvapi-your-key"
 
120
  ANTHROPIC_AUTH_TOKEN="freecc"
121
+ MODEL="nvidia_nim/z-ai/glm4.7"
122
  ```
123
 
124
+ ### 3. Start Proxy
 
 
125
 
126
  ```bash
127
+ uv sync
128
  uv run uvicorn server:app --host 0.0.0.0 --port 8082
129
  ```
130
 
131
+ ### 4. Run Claude Code
132
 
133
  ```bash
134
+ export ANTHROPIC_AUTH_TOKEN="freecc"
135
+ export ANTHROPIC_BASE_URL="http://localhost:8082"
136
+ claude
137
  ```
138
 
139
+ ## Available Models
140
 
141
+ The proxy automatically routes to these models in order:
142
 
143
+ | Model | Best For | Speed |
144
+ |-------|----------|-------|
145
+ | `qwen3-coder-480b` | Code generation | Fast |
146
+ | `glm4.7` | General purpose | Fast |
147
+ | `step-3.5-flash` | Fast responses | Very Fast |
148
+ | `mistral-large-3` | Reasoning | Medium |
149
+ | `dracarys-llama-3.1-70b` | Complex tasks | Medium |
150
+ | `seed-oss-36b` | Balanced | Fast |
151
+ | `mistral-nemotron` | Thinking tasks | Medium |
152
 
153
+ ## How Auto-Routing Works
154
 
155
+ When you use `auto` model, the proxy:
 
 
156
 
157
+ 1. **Tries models in order** of speed/reliability
158
+ 2. **Skips rate-limited models** - pre-flight check before each request
159
+ 3. **Fast failover** - if one model times out, immediately tries next
160
+ 4. **No API waste** - common probes handled locally
161
 
 
 
162
  ```
163
+ Request: "Write a function"
164
+ ↓
165
+ Check if model 1 is rate-limited? β†’ Yes β†’ Skip
166
+ Check if model 2 is rate-limited? β†’ No β†’ Try
167
+ ↓
168
+ Model 2 responds? β†’ Yes β†’ Stream response
169
+ Model 2 timeout? β†’ Try model 3 β†’ Success!
170
  ```
171
 
172
+ ## Environment Variables
 
 
 
 
 
 
 
 
 
 
 
173
 
174
+ ### Required
175
  ```dotenv
176
+ NVIDIA_NIM_API_KEY="nvapi-your-key" # From build.nvidia.com
177
+ ANTHROPIC_AUTH_TOKEN="your-secret" # Any secret you choose
178
  ```
179
 
180
+ ### Optional
 
 
 
 
 
 
 
 
 
 
 
 
181
  ```dotenv
182
+ MODEL="nvidia_nim/z-ai/glm4.7" # Default model
183
+ MODEL_OPUS="nvidia_nim/qwen/qwen3-..." # Model for Opus requests
184
+ MODEL_SONNET="nvidia_nim/z-ai/glm4.7" # Model for Sonnet requests
185
+ MODEL_HAIKU="nvidia_nim/z-ai/glm4.7" # Model for Haiku requests
 
 
 
 
 
 
 
 
 
 
186
 
187
+ # Auto-routing order (comma-separated)
188
+ AUTO_MODEL_PRIORITY="nvidia_nim/qwen/...,nvidia_nim/z-ai/..."
189
 
190
+ # Thinking support
191
+ ENABLE_MODEL_THINKING=true # Enable reasoning blocks
 
192
  ```
193
 
194
+ ## IDE Integration
 
 
 
 
 
 
 
 
 
 
 
 
 
 
195
 
196
  ### VS Code Extension
197
 
198
+ Add to `.vscode/settings.json`:
 
199
  ```json
200
+ {
201
+ "claudeCode.environmentVariables": [
202
+ { "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
203
+ { "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" }
204
+ ]
205
+ }
206
  ```
207
 
 
 
208
  ### JetBrains ACP
209
 
210
+ Edit `~/.jetbrains/acp.json`:
 
 
 
 
 
 
211
  ```json
212
+ {
213
+ "env": {
214
+ "ANTHROPIC_BASE_URL": "http://localhost:8082",
215
+ "ANTHROPIC_AUTH_TOKEN": "freecc"
216
+ }
217
  }
218
  ```
219
 
220
+ ### Remote/Ssh
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
221
 
222
+ For remote development, deploy to HuggingFace Spaces and use:
223
  ```bash
224
+ export ANTHROPIC_BASE_URL="https://your-space.hf.space"
 
 
225
  ```
226
 
227
+ ## Deployment Options
 
 
 
 
 
228
 
229
+ ### HuggingFace Spaces (Recommended for Cloud)
230
 
231
+ **Free tier includes:**
232
+ - 2 vCPU
233
+ - Community support
234
+ - Automatic HTTPS
235
+ - Git-based deployment
236
 
237
+ **Setup:**
238
+ 1. Fork [the space](https://huggingface.co/spaces/Yash030/claude-code-proxy)
239
+ 2. Add `NVIDIA_NIM_API_KEY` to Space secrets
240
+ 3. Access at `https://your-space.hf.space`
241
 
242
+ ### Railway (Easy Deploy)
243
 
244
+ 1. Connect GitHub repo
245
+ 2. Set environment variables
246
+ 3. Deploy with auto-scaling
 
 
 
 
 
 
 
 
 
247
 
248
+ ### Render (Free Tier)
249
 
250
+ 1. Create Web Service
251
+ 2. Connect GitHub
252
+ 3. Set build command: `uv sync`
253
+ 4. Set start command: `uv run uvicorn server:app --host 0.0.0.0 --port $PORT`
254
 
255
+ ### Fly.io (Global Edge)
256
 
257
+ ```bash
258
+ fly launch
259
+ fly secrets set NVIDIA_NIM_API_KEY="nvapi-..."
260
+ fly deploy
261
  ```
262
 
263
+ ### Local/Docker
264
 
265
+ ```bash
266
+ docker build -t free-claude-code .
267
+ docker run -p 8082:8082 \
268
+ -e NVIDIA_NIM_API_KEY="nvapi-..." \
269
+ -e ANTHROPIC_AUTH_TOKEN="freecc" \
270
+ free-claude-code
 
271
  ```
272
 
273
+ ## Architecture
 
 
274
 
 
 
 
 
 
 
 
 
275
  ```
276
+ api/
277
+ β”œβ”€β”€ routes.py # FastAPI endpoints
278
+ β”œβ”€β”€ services.py # Request handling & failover
279
+ β”œβ”€β”€ model_router.py # Model resolution
280
+ β”œβ”€β”€ detection.py # Request type detection
281
+ └── optimization_handlers.py # Fast-path responses
282
 
283
+ core/
284
+ β”œβ”€β”€ anthropic/ # SSE, token counting, tool parsing
285
+ └── task_detector.py # Task capability detection
286
 
287
+ providers/
288
+ β”œβ”€β”€ openai_compat.py # Base OpenAI transport
289
+ β”œβ”€β”€ nvidia_nim/ # NVIDIA NIM provider
290
+ └── rate_limit.py # Rate limiting
291
 
292
+ messaging/
293
+ β”œβ”€β”€ discord.py # Discord bot wrapper
294
+ └── telegram.py # Telegram bot wrapper
 
295
  ```
296
 
 
 
297
  ## Troubleshooting
298
 
299
+ ### "undefined ... input_tokens" error
300
+ - Update to latest version: `git pull`
301
+ - Check `ANTHROPIC_BASE_URL` doesn't end with `/v1`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
302
 
303
  ### Provider disconnects during streaming
304
+ - Reduce `PROVIDER_MAX_CONCURRENCY`
305
+ - Increase `HTTP_READ_TIMEOUT`
306
+ - Check NVIDIA NIM status at [status.nvidia.com](https://status.nvidia.com)
307
 
308
+ ### Model not responding
309
+ - Check your NVIDIA API key is valid
310
+ - Verify rate limits haven't been hit
311
+ - Try a different model
 
312
 
313
+ ### VS Code extension shows login
314
+ - Reload the extension after setting env vars
315
+ - Confirm environment variables are set correctly
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
316
 
317
  ## Contributing
318
 
319
+ 1. Fork the repo
320
+ 2. Create a feature branch
321
+ 3. Run checks: `uv run ruff format && uv run ruff check && uv run ty check`
322
+ 4. Submit PR
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
323
 
324
+ ## License
 
 
 
325
 
326
+ MIT License - See [LICENSE](LICENSE)
327
 
328
+ ## Links
329
 
330
+ - [GitHub](https://github.com/Yashwant00CR7/claude-code-nvidia)
331
+ - [HuggingFace Space](https://huggingface.co/spaces/Yash030/claude-code-proxy)
332
+ - [NVIDIA NIM](https://build.nvidia.com)
333
+ - [Claude Code](https://github.com/anthropics/claude-code)