ScottzillaSystems
/

agent-zero-fixed

Model card Files Files and versions

xet

Community

ScottzillaSystems commited on 12 days ago

Commit

d6c1199

verified ·

1 Parent(s): 0f63d2f

Update README to reference the operational ZeroGPU Space

Browse files

Files changed (1) hide show

README.md +32 -21

README.md CHANGED Viewed

@@ -1,31 +1,42 @@
-# Agent Zero — Native HF Space
-**Fixed version that loads your ACTUAL model weights** — not a proxy.
-## What was wrong with the old Agent Zero
-The old Agent Zero (`agent-zero`, `agent-zero-pentesting`, etc.) was designed as a **Docker Compose multi-service stack** — LiteLLM proxy + TGI endpoints + PostgreSQL + SearXNG. On HF Spaces, only a single Docker container runs. The orchestrator tries to connect to `http://localhost:4000` (LiteLLM proxy) which **doesn't exist**, so **no models ever load**.
-The "models_loaded: 3" in the logs was fake — the service_monitor was reporting ollama container health, not actual model availability.
-## What this does
-- Loads your **actual model weights** from your HF repos via `AutoModelForCausalLM.from_pretrained()`
-- No LiteLLM, no TGI, no PostgreSQL, no Docker Compose
-- Models load on-demand, persist in memory cache
-- ZeroGPU compatible (`@spaces.GPU` decorator)
-- Select any model from the catalog dropdown
-## Models available
-| Model | Tier | Size | Repo |
-|---|---|---|---|
-| chatgpt5 | T0 | 494M | `ScottzillaSystems/ChatGPT-5-Chat` |
-| qwen3.5-9b | T1 | 9.6B | `ScottzillaSystems/Qwen3.5-9B-Chat` |
-| cydonia-24b | T2 | 24B | `ScottzillaSystems/Cydonia-24B-v4.1` |
-| qwen3.5-27b | T3 | 27B | `ScottzillaSystems/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled` |
-| fallen-command | T4 | 111B | `ScottzillaSystems/Fallen-Command-A-111B-Chat` |
-## Hardware
-Currently configured for `cpu-basic` startup. Upgrade to `a10g-large` or `a100-large` for larger models. ZeroGPU (`zero-a10g`) works for models up to 24B.

+# Agent Zero — ZeroGPU Native
+**This repo contains the reference implementation for the Agent Zero ZeroGPU Space.**
+## ➡️ Live Space
+The fully operational Space is at:
+**https://huggingface.co/spaces/ScottzillaSystems/agent-zero-orchestration**
+## Architecture
+- **SDK**: Gradio (required for ZeroGPU)
+- **Hardware**: ZeroGPU (H200, 70GB VRAM)
+- **Decorator**: `@spaces.GPU(duration=180)` for all inference functions
+- **Models**: Loaded on-demand per request via `transformers`
+## Models (ScottzillaSystems Fleet)
+| Model | Tier | Size | Architecture | Repo |
+|---|---|---|---|---|
+| ChatGPT-5 | T0 | 494M | Qwen2ForCausalLM | `ScottzillaSystems/ChatGPT-5` |
+| Qwen3.5 9B Opus | T1 | 9.6B | Qwen3_5ForConditionalGeneration | `ScottzillaSystems/Huihui-Qwen3.5-9B-Claude-4.6-Opus-abliterated` |
+| SuperGemma4 | T1 | 7.5B | Gemma4ForConditionalGeneration | `ScottzillaSystems/supergemma4-e4b-abliterated` |
+| Cydonia 24B | T2 | 24B | MistralForCausalLM | `ScottzillaSystems/Cydonia-24B-v4.1` |
+| Qwen3.6 27B | T3 | 27.8B | CausalLM | `ScottzillaSystems/Huihui-Qwen3.6-27B-abliterated` |
+| Qwen3 VL 8B | VL | 8.8B | ConditionalGeneration | `ScottzillaSystems/Huihui-Qwen3-VL-8B-Instruct-abliterated` |
+| Qwen3.5 9B Base | T1 | 9.6B | Qwen3_5ForConditionalGeneration | `ScottzillaSystems/Qwen3.5-9B` |
+## Key Design Decisions
+1. **ZeroGPU requires Gradio SDK** — Docker SDK is not supported
+2. **Models load inside `@spaces.GPU`** — GPU is allocated per-request
+3. **`AutoModelForImageTextToText`** for multimodal models (Qwen3.5, SuperGemma4, Qwen3 VL)
+4. **`AutoModelForCausalLM`** for standard text models (ChatGPT-5, Cydonia, Qwen3.6 27B)
+5. **Smart auto-routing** with fallback chain (T3→T2→T1→T0)
+6. **No LiteLLM, no TGI, no Docker Compose** — pure transformers + ZeroGPU
+## Setup
+1. Set `HF_TOKEN` as Space Secret
+2. Set hardware to ZeroGPU in Space Settings
+3. Done — models load on first request