ScottzillaSystems commited on
Commit
d6c1199
·
verified ·
1 Parent(s): 0f63d2f

Update README to reference the operational ZeroGPU Space

Browse files
Files changed (1) hide show
  1. README.md +32 -21
README.md CHANGED
@@ -1,31 +1,42 @@
1
- # Agent Zero — Native HF Space
2
 
3
- **Fixed version that loads your ACTUAL model weights** not a proxy.
4
 
5
- ## What was wrong with the old Agent Zero
6
 
7
- The old Agent Zero (`agent-zero`, `agent-zero-pentesting`, etc.) was designed as a **Docker Compose multi-service stack** — LiteLLM proxy + TGI endpoints + PostgreSQL + SearXNG. On HF Spaces, only a single Docker container runs. The orchestrator tries to connect to `http://localhost:4000` (LiteLLM proxy) which **doesn't exist**, so **no models ever load**.
 
8
 
9
- The "models_loaded: 3" in the logs was fake — the service_monitor was reporting ollama container health, not actual model availability.
10
 
11
- ## What this does
 
 
 
12
 
13
- - Loads your **actual model weights** from your HF repos via `AutoModelForCausalLM.from_pretrained()`
14
- - No LiteLLM, no TGI, no PostgreSQL, no Docker Compose
15
- - Models load on-demand, persist in memory cache
16
- - ZeroGPU compatible (`@spaces.GPU` decorator)
17
- - Select any model from the catalog dropdown
18
 
19
- ## Models available
 
 
 
 
 
 
 
 
20
 
21
- | Model | Tier | Size | Repo |
22
- |---|---|---|---|
23
- | chatgpt5 | T0 | 494M | `ScottzillaSystems/ChatGPT-5-Chat` |
24
- | qwen3.5-9b | T1 | 9.6B | `ScottzillaSystems/Qwen3.5-9B-Chat` |
25
- | cydonia-24b | T2 | 24B | `ScottzillaSystems/Cydonia-24B-v4.1` |
26
- | qwen3.5-27b | T3 | 27B | `ScottzillaSystems/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled` |
27
- | fallen-command | T4 | 111B | `ScottzillaSystems/Fallen-Command-A-111B-Chat` |
28
 
29
- ## Hardware
 
 
 
 
 
30
 
31
- Currently configured for `cpu-basic` startup. Upgrade to `a10g-large` or `a100-large` for larger models. ZeroGPU (`zero-a10g`) works for models up to 24B.
 
 
 
 
 
1
+ # Agent Zero — ZeroGPU Native
2
 
3
+ **This repo contains the reference implementation for the Agent Zero ZeroGPU Space.**
4
 
5
+ ## ➡️ Live Space
6
 
7
+ The fully operational Space is at:
8
+ **https://huggingface.co/spaces/ScottzillaSystems/agent-zero-orchestration**
9
 
10
+ ## Architecture
11
 
12
+ - **SDK**: Gradio (required for ZeroGPU)
13
+ - **Hardware**: ZeroGPU (H200, 70GB VRAM)
14
+ - **Decorator**: `@spaces.GPU(duration=180)` for all inference functions
15
+ - **Models**: Loaded on-demand per request via `transformers`
16
 
17
+ ## Models (ScottzillaSystems Fleet)
 
 
 
 
18
 
19
+ | Model | Tier | Size | Architecture | Repo |
20
+ |---|---|---|---|---|
21
+ | ChatGPT-5 | T0 | 494M | Qwen2ForCausalLM | `ScottzillaSystems/ChatGPT-5` |
22
+ | Qwen3.5 9B Opus | T1 | 9.6B | Qwen3_5ForConditionalGeneration | `ScottzillaSystems/Huihui-Qwen3.5-9B-Claude-4.6-Opus-abliterated` |
23
+ | SuperGemma4 | T1 | 7.5B | Gemma4ForConditionalGeneration | `ScottzillaSystems/supergemma4-e4b-abliterated` |
24
+ | Cydonia 24B | T2 | 24B | MistralForCausalLM | `ScottzillaSystems/Cydonia-24B-v4.1` |
25
+ | Qwen3.6 27B | T3 | 27.8B | CausalLM | `ScottzillaSystems/Huihui-Qwen3.6-27B-abliterated` |
26
+ | Qwen3 VL 8B | VL | 8.8B | ConditionalGeneration | `ScottzillaSystems/Huihui-Qwen3-VL-8B-Instruct-abliterated` |
27
+ | Qwen3.5 9B Base | T1 | 9.6B | Qwen3_5ForConditionalGeneration | `ScottzillaSystems/Qwen3.5-9B` |
28
 
29
+ ## Key Design Decisions
 
 
 
 
 
 
30
 
31
+ 1. **ZeroGPU requires Gradio SDK** — Docker SDK is not supported
32
+ 2. **Models load inside `@spaces.GPU`** — GPU is allocated per-request
33
+ 3. **`AutoModelForImageTextToText`** for multimodal models (Qwen3.5, SuperGemma4, Qwen3 VL)
34
+ 4. **`AutoModelForCausalLM`** for standard text models (ChatGPT-5, Cydonia, Qwen3.6 27B)
35
+ 5. **Smart auto-routing** with fallback chain (T3→T2→T1→T0)
36
+ 6. **No LiteLLM, no TGI, no Docker Compose** — pure transformers + ZeroGPU
37
 
38
+ ## Setup
39
+
40
+ 1. Set `HF_TOKEN` as Space Secret
41
+ 2. Set hardware to ZeroGPU in Space Settings
42
+ 3. Done — models load on first request