SGLang Diffusion
SGLang Diffusion is an inference framework for accelerated image and video generation using diffusion models. It provides an end-to-end unified pipeline with optimized kernels and an efficient scheduler loop.
Key Features
- Broad Model Support: Wan series, FastWan series, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux, Z-Image, GLM-Image, and more
- Fast Inference: Optimized kernels, efficient scheduler loop, and Cache-DiT acceleration
- Ease of Use: OpenAI-compatible API, CLI, and Python SDK
- Multi-Platform: NVIDIA GPUs (H100, H200, A100, B200, 4090), AMD GPUs (MI300X, MI325X) and Ascend NPU (A2, A3)
Quick Start
Installation
uv pip install "sglang[diffusion]" --prerelease=allow
See Installation Guide for more installation methods and ROCm-specific instructions.
Basic Usage
Generate an image with the CLI:
sglang generate --model-path Qwen/Qwen-Image \
--prompt "A beautiful sunset over the mountains" \
--save-output
Or start a server with the OpenAI-compatible API:
sglang serve --model-path Qwen/Qwen-Image --port 30010
Documentation
Getting Started
- Installation - Install SGLang Diffusion via pip, uv, Docker, or from source
- Compatibility Matrix - Supported models and optimization compatibility
Usage
- CLI Documentation - Command-line interface for
sglang generateandsglang serve - OpenAI API - OpenAI-compatible API for image/video generation and LoRA management
Performance Optimization
- Performance Overview - Overview of all performance optimization strategies
- Attention Backends - Available attention backends (FlashAttention, SageAttention, etc.)
- Caching Strategies - Cache-DiT and TeaCache acceleration
- Profiling - Profiling techniques with PyTorch Profiler and Nsight Systems
Reference
- Environment Variables - Configuration via environment variables
- Support New Models - Guide for adding new diffusion models
- Contributing - Contribution guidelines and commit message conventions
- CI Performance - Performance baseline generation script
CLI Quick Reference
Generate (one-off generation)
sglang generate --model-path <MODEL> --prompt "<PROMPT>" --save-output
Serve (HTTP server)
sglang serve --model-path <MODEL> --port 30010
Enable Cache-DiT acceleration
SGLANG_CACHE_DIT_ENABLED=true sglang generate --model-path <MODEL> --prompt "<PROMPT>"