Instructions to use WaveCut/Boogu-Image-0.1-Edit-Turbo-SDNQ-uint4-static with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use WaveCut/Boogu-Image-0.1-Edit-Turbo-SDNQ-uint4-static with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("WaveCut/Boogu-Image-0.1-Edit-Turbo-SDNQ-uint4-static", dtype=torch.bfloat16, device_map="cuda") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Boogu Image 0.1 Edit Turbo SDNQ UINT4 Static
SDNQ 4-bit unsigned static quantization of Boogu/Boogu-Image-0.1-Edit-Turbo.
Source checkpoint: main@2026-06-30 / 0049942e5cc8340ef5d5843b5574756d7c30be55.
What Is Quantized
Selected recipe: uint4-static-transformer-only.
Only the diffusion transformer is quantized with SDNQ UINT4 static weights. The MLLM instruction encoder, processor, scheduler, VAE, tokenizer assets, and Boogu pipeline code are copied from the upstream checkpoint.
Benchmark Setup
- Pipeline:
BooguImageTurboPipeline - Task:
ti2i - Resolution: 1024x1024
- Steps: 4
- Guidance:
text_guidance_scale=1.0,image_guidance_scale=1.0,empty_instruction_guidance_scale=0.0 - DMD conditioning sigma: 0.0
- Torch dtype: bfloat16
- Prompt set: 10 prompts covering simple scenes, abstract imagery, public-domain style, a historical public figure, complex typography, dense Latin text, dense Russian text, and diagrams
- Hardware: NVIDIA RTX PRO 6000 Blackwell Server Edition on a disposable RunPod pod with local container disk
Benchmark Summary
| Model | Load | Cold gen | Hot mean | VRAM after load | VRAM during gen | VRAM after gen | Torch peak |
|---|---|---|---|---|---|---|---|
| original | 21.166 s | 11.769 s | 7.618 s | 36603 MB | 41017 MB | 41017 MB | 37851.1953125 MB |
| sdnq | 19.761 s | 11.216 s | 10.784 s | 21855 MB | 26293 MB | 26293 MB | 23711.0302734375 MB |
Raw per-prompt metrics are in benchmark/*.metrics.csv and benchmark/*.metrics.jsonl. The combined summary is in benchmark/summary.json.
Consumer GPU Offload Smoke
These additional SDNQ rows were measured on NVIDIA GeForce RTX 5090, 32607 MB VRAM on a disposable RunPod pod. RTX 3090 and RTX 4090 allocation attempts were unavailable at run time, so RTX 5090 was used as the nearest available consumer-class fallback. Runtime: PyTorch 2.9.1+cu128, CUDA runtime 12.8, NVIDIA driver 580.126.09. For the Edit model, these rows used the same 10 reference images as the original comparison set.
| Model | Load | Cold gen | Hot mean | VRAM after load | VRAM during gen | VRAM after gen | Torch peak |
|---|---|---|---|---|---|---|---|
| sdnq + model offload | 7.126 s | 32.162 s | 20.328 s | 510 MB | 18368 MB | 660 MB | 17278.34 MB |
| sdnq + sequential offload | 7.170 s | 16.400 s | 11.961 s | 512 MB | 2954 MB | 1426 MB | 2448.23 MB |
Offload metrics are stored as benchmark/sdnq-model-offload.* and benchmark/sdnq-sequential-offload.*.
Usage
pip install -U git+https://github.com/boogu-project/Boogu-Image.git sdnq transformers accelerate safetensors huggingface_hub
import sys
import torch
from diffusers.models import AutoencoderKL
from huggingface_hub import snapshot_download
from transformers import AutoModelForImageTextToText, AutoProcessor
from boogu.models.transformers.transformer_boogu import BooguImageTransformer2DModel
from boogu.pipelines.boogu.pipeline_boogu_turbo import BooguImageTurboPipeline
from boogu.schedulers.scheduling_flow_match_euler_discrete_time_shifting import FlowMatchEulerDiscreteScheduler
from sdnq.loader import load_sdnq_model
repo_id = "WaveCut/Boogu-Image-0.1-Edit-Turbo-SDNQ-uint4-static"
device = "cuda:0"
repo_dir = snapshot_download(repo_id)
transformer_code_dir = f"{repo_dir}/transformer"
if transformer_code_dir not in sys.path:
sys.path.insert(0, transformer_code_dir)
transformer = load_sdnq_model(
f"{repo_dir}/transformer",
model_cls=BooguImageTransformer2DModel,
dtype=torch.bfloat16,
device="cpu",
dequantize_fp32=False,
use_quantized_matmul=True,
)
pipe = BooguImageTurboPipeline(
transformer=transformer,
vae=AutoencoderKL.from_pretrained(f"{repo_dir}/vae", torch_dtype=torch.bfloat16),
scheduler=FlowMatchEulerDiscreteScheduler.from_pretrained(f"{repo_dir}/scheduler"),
mllm=AutoModelForImageTextToText.from_pretrained(
f"{repo_dir}/mllm",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
),
processor=AutoProcessor.from_pretrained(f"{repo_dir}/processor", trust_remote_code=True),
).to(device)
# Text-to-image models use:
image = pipe(
instruction=["A precise studio photograph of a glass lamp on a dark table"],
negative_instruction="",
empty_instruction="",
height=1024,
width=1024,
num_inference_steps=4,
text_guidance_scale=1.0,
image_guidance_scale=1.0,
empty_instruction_guidance_scale=0.0,
use_dmd_student_inference=True,
dmd_conditioning_sigma=0.0,
generator=torch.Generator(device).manual_seed(42),
).images[0]
For image-editing models, also pass input_image_paths, input_images, align_res, and the same DMD settings used by upstream Boogu Edit Turbo.
You can also download explicitly with hf download WaveCut/Boogu-Image-0.1-Edit-Turbo-SDNQ-uint4-static --local-dir ./boogu-sdnq and set repo_dir = "./boogu-sdnq".
Quantization Recipe
{
"dynamic_loss_threshold": null,
"group_size": 0,
"modules": [
"transformer"
],
"name": "uint4-static-transformer-only",
"quant_conv": false,
"quant_embedding": false,
"svd_rank": 32,
"svd_steps": 32,
"use_dynamic_quantization": false,
"use_svd": false,
"weights_dtype": "uint4"
}
Release Contents
transformer/: SDNQ UINT4 static transformer weights andquantization_config.jsonmllm/,processor/,scheduler/,vae/: copied from the upstream checkpointbenchmark/: original and SDNQ metrics, summaries, and prompt outputs metadataassets/original_vs_sdnq_edit.webp: native-resolution original-vs-SDNQ WebP comparison grid, quality 95prompts.json,quantization_manifest.json,SHA256SUMS
Limitations
- This is a quantized derivative and inherits upstream behavior and limitations.
- The comparison set is a deployment smoke benchmark, not a preference study.
- Text rendering, Cyrillic text, and small labels should still be inspected manually for production use.
- Benchmark numbers depend on GPU, driver, CUDA, PyTorch, Transformers, Diffusers, Boogu code, and SDNQ versions.
- Downloads last month
- 32
Model tree for WaveCut/Boogu-Image-0.1-Edit-Turbo-SDNQ-uint4-static
Base model
Boogu/Boogu-Image-0.1-Edit-Turbo