RX 580 Local AI — Complete Stack
AIVisionsLab Studios · São Paulo, Brazil 🇧🇷
Running SOTA AI on 2017 hardware in 2026. No CUDA. No ROCm. No cloud.
What this is
This repository documents the complete stack for running local AI on an AMD RX 580 8GB using the Vulkan API as the GPU backend — bypassing the need for CUDA or ROCm entirely.
AMD officially dropped ROCm support for Polaris/GCN4 in v5.x. DirectML failed. OpenVINO failed.
This project proves the hardware is still capable — the problem was always the software stack, not the GPU.
Full master documentation (PT/EN/ES/FR/AR):
🌐 setup-ia-local-rx580-vulkan.web.app
Hardware
| Component | Spec |
|---|---|
| GPU | AMD RX 580 2048SP 8GB GDDR5 (Polaris / GCN4) |
| CPU | Intel Xeon E5-2690 v3 — 12c/24t · 3.5GHz boost (2014) |
| RAM | 32GB DDR4 REG ECC Quad Channel RDIMM |
| Storage | NVMe 1TB — 1.7–3.5 GB/s (critical bottleneck) |
| OS | Windows 10 Pro + WSL2 Ubuntu 22.04.5 |
| Vulkan SDK | 1.4.341.1 |
| AMD Driver | 31.0.21924.61 |
Performance (real logs, not synthetic benchmarks)
LLM — llama.cpp with Vulkan
| Model | Quantization | Speed | VRAM |
|---|---|---|---|
| Mistral 7B Instruct | Q4_K_M | ~9 tok/s | ~6GB |
| Llama 3 8B Instruct | Q4_K_M | ~7 tok/s | ~6.8GB |
| Qwen2.5 7B | Q4_K_M | ~8 tok/s | ~6.2GB |
| DeepSeek R1 8B | Q4_K_M | ~7 tok/s | ~6.8GB |
CPU baseline (Xeon, no GPU): 3–5 tok/s. Vulkan uplift: 3–4×
Image Generation — stable-diffusion.cpp with Vulkan
| Model | Resolution | Steps | Time | Backend |
|---|---|---|---|---|
| DreamShaper 8 (SD 1.5 GGUF) | 512×512 | 20 | ~72s | RX 580 Vulkan |
| FLUX.1 Schnell q4_k | 1024×1024 | 4 | ~14 min | GPU+CPU hybrid |
| FLUX.1 Schnell fp8 (16GB) | 1024×1024 | 4 | ~24 min | Xeon CPU / WSL2 |
Storage impact
| Operation | HDD | NVMe | Improvement |
|---|---|---|---|
| LLM 7B load | ~25 min | ~4 min | 6× faster |
| FLUX 16GB load | ~25 min | ~30s | 50× faster |
Models used
For sd-server (stable-diffusion.cpp)
⚠️ Critical: Only use leejet GGUF models for sd-server.
city96 GGUF models are ComfyUI-only. Using them returnsnew_sd_ctx_t failed.
| Model | Source | Use |
|---|---|---|
flux1-schnell-q4_k.gguf |
leejet/FLUX.1-schnell-gguf | FLUX GPU hybrid |
flux1-schnell-Q3_K_S.gguf |
leejet/FLUX.1-schnell-gguf | FLUX lighter (~5.2GB) |
DreamShaper_8.safetensors |
Civitai | SD 1.5 production |
For ComfyUI (city96 compatible)
| Model | Source | Use |
|---|---|---|
flux1-schnell-Q4_K_S.gguf |
city96/FLUX.1-schnell-gguf | ComfyUI only |
flux1-schnell-fp8.safetensors |
Comfy-Org | Full 16GB CPU |
VAE / CLIP / T5XXL (required for FLUX)
| File | Purpose | RAM allocation |
|---|---|---|
ae.safetensors |
VAE decoder | ~160MB CPU |
clip_l.safetensors |
CLIP encoder | ~235MB GPU |
t5xxl_fp16.safetensors |
T5 encoder | ~9.3GB CPU |
t5xxl_fp8.safetensors |
T5 encoder (lighter) | ~5GB CPU |
Architecture
OpenWebUI (Docker :3000)
│
├──► LLM: llama-server.exe (:8081) — RX 580 Vulkan
│ └── fallback: Ollama (:11434) — CPU
│
└──► Images:
├──► SD 1.5 GGUF: sd-server.exe (:7860) — RX 580 Vulkan
└──► FLUX.1 16GB: ComfyUI (:8188) — Xeon CPU WSL2
FLUX memory segmentation
| Component | File | Allocation | Size |
|---|---|---|---|
| Diffusion model | flux1-schnell-q4_k.gguf | GPU VRAM | ~6.5GB |
| VAE | ae.safetensors | CPU RAM | ~160MB |
| CLIP L | clip_l.safetensors | GPU VRAM | ~235MB |
| T5XXL | t5xxl_fp16.safetensors | CPU RAM | ~9.3GB |
What failed (documented with root cause)
| Attempt | Error | Root cause |
|---|---|---|
| DirectML | OpaqueTensorImpl |
MS encapsulates tensors — ComfyUI can't read them |
| ROCm | Kernel panics | GCN4/Polaris dropped in v5.x — permanent |
| OpenVINO + Forge | No module 'ldm' |
Extension targets A1111 — incompatible with Forge |
| CPU + HDD | ~19 min/image | Zero GPU utilization + I/O bottleneck |
Full analysis: docs/what-failed.md
Community & Credits
This work builds on independent research from:
| Author | Publication | Contribution |
|---|---|---|
| 艾米心 Amihart | Medium, Jan 2025 | First validation of LLMs via Vulkan on RX 580 — 24.56 tok/s |
| DH / DadHacks | dadhacks.org, Dec 2025 | Refuted "SD can't run on Vulkan" — sd.cpp Linux guide |
| leejet | GitHub | stable-diffusion.cpp engine |
| ggerganov | GitHub | llama.cpp + ggml engine |
| woodrex | Docker Hub | ROCm gfx803 containers |
"The hardware was never obsolete. It was waiting for the right software."
GitHub
📦 aivisionslab-studios/rx580-local-ai-guide
Scripts, build guides, automation, troubleshooting docs.
License
MIT — use freely, give credit, document what you learn.