Update README to reference the operational ZeroGPU Space
Browse files
README.md
CHANGED
|
@@ -1,31 +1,42 @@
|
|
| 1 |
-
# Agent Zero —
|
| 2 |
|
| 3 |
-
**
|
| 4 |
|
| 5 |
-
##
|
| 6 |
|
| 7 |
-
The
|
|
|
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
-
|
| 14 |
-
- No LiteLLM, no TGI, no PostgreSQL, no Docker Compose
|
| 15 |
-
- Models load on-demand, persist in memory cache
|
| 16 |
-
- ZeroGPU compatible (`@spaces.GPU` decorator)
|
| 17 |
-
- Select any model from the catalog dropdown
|
| 18 |
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
|
| 22 |
-
|---|---|---|---|
|
| 23 |
-
| chatgpt5 | T0 | 494M | `ScottzillaSystems/ChatGPT-5-Chat` |
|
| 24 |
-
| qwen3.5-9b | T1 | 9.6B | `ScottzillaSystems/Qwen3.5-9B-Chat` |
|
| 25 |
-
| cydonia-24b | T2 | 24B | `ScottzillaSystems/Cydonia-24B-v4.1` |
|
| 26 |
-
| qwen3.5-27b | T3 | 27B | `ScottzillaSystems/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled` |
|
| 27 |
-
| fallen-command | T4 | 111B | `ScottzillaSystems/Fallen-Command-A-111B-Chat` |
|
| 28 |
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Agent Zero — ZeroGPU Native
|
| 2 |
|
| 3 |
+
**This repo contains the reference implementation for the Agent Zero ZeroGPU Space.**
|
| 4 |
|
| 5 |
+
## ➡️ Live Space
|
| 6 |
|
| 7 |
+
The fully operational Space is at:
|
| 8 |
+
**https://huggingface.co/spaces/ScottzillaSystems/agent-zero-orchestration**
|
| 9 |
|
| 10 |
+
## Architecture
|
| 11 |
|
| 12 |
+
- **SDK**: Gradio (required for ZeroGPU)
|
| 13 |
+
- **Hardware**: ZeroGPU (H200, 70GB VRAM)
|
| 14 |
+
- **Decorator**: `@spaces.GPU(duration=180)` for all inference functions
|
| 15 |
+
- **Models**: Loaded on-demand per request via `transformers`
|
| 16 |
|
| 17 |
+
## Models (ScottzillaSystems Fleet)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
+
| Model | Tier | Size | Architecture | Repo |
|
| 20 |
+
|---|---|---|---|---|
|
| 21 |
+
| ChatGPT-5 | T0 | 494M | Qwen2ForCausalLM | `ScottzillaSystems/ChatGPT-5` |
|
| 22 |
+
| Qwen3.5 9B Opus | T1 | 9.6B | Qwen3_5ForConditionalGeneration | `ScottzillaSystems/Huihui-Qwen3.5-9B-Claude-4.6-Opus-abliterated` |
|
| 23 |
+
| SuperGemma4 | T1 | 7.5B | Gemma4ForConditionalGeneration | `ScottzillaSystems/supergemma4-e4b-abliterated` |
|
| 24 |
+
| Cydonia 24B | T2 | 24B | MistralForCausalLM | `ScottzillaSystems/Cydonia-24B-v4.1` |
|
| 25 |
+
| Qwen3.6 27B | T3 | 27.8B | CausalLM | `ScottzillaSystems/Huihui-Qwen3.6-27B-abliterated` |
|
| 26 |
+
| Qwen3 VL 8B | VL | 8.8B | ConditionalGeneration | `ScottzillaSystems/Huihui-Qwen3-VL-8B-Instruct-abliterated` |
|
| 27 |
+
| Qwen3.5 9B Base | T1 | 9.6B | Qwen3_5ForConditionalGeneration | `ScottzillaSystems/Qwen3.5-9B` |
|
| 28 |
|
| 29 |
+
## Key Design Decisions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
+
1. **ZeroGPU requires Gradio SDK** — Docker SDK is not supported
|
| 32 |
+
2. **Models load inside `@spaces.GPU`** — GPU is allocated per-request
|
| 33 |
+
3. **`AutoModelForImageTextToText`** for multimodal models (Qwen3.5, SuperGemma4, Qwen3 VL)
|
| 34 |
+
4. **`AutoModelForCausalLM`** for standard text models (ChatGPT-5, Cydonia, Qwen3.6 27B)
|
| 35 |
+
5. **Smart auto-routing** with fallback chain (T3→T2→T1→T0)
|
| 36 |
+
6. **No LiteLLM, no TGI, no Docker Compose** — pure transformers + ZeroGPU
|
| 37 |
|
| 38 |
+
## Setup
|
| 39 |
+
|
| 40 |
+
1. Set `HF_TOKEN` as Space Secret
|
| 41 |
+
2. Set hardware to ZeroGPU in Space Settings
|
| 42 |
+
3. Done — models load on first request
|