ERNIE Image Turbo — Nunchaku W4A4 Quantized Inference
Introduction
This adds W4A4 quantized inference support for ERNIE Image Turbo to Nunchaku, delivering significant speedup and memory reduction with minimal quality loss.
Built on Nunchaku. We gratefully acknowledge their excellent work on efficient diffusion model inference.
Installation
# This fork adds ERNIE Image support to Nunchaku
git clone https://github.com/Hzj199/nunchaku.git
cd nunchaku
git submodule update --init --recursive
pip install build
NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation
pip install dist/nunchaku-*.whl
Quick Start
import torch
from diffusers.pipelines.ernie_image.pipeline_ernie_image import ErnieImagePipeline
from nunchaku import NunchakuErnieImageTransformer2DModel
from nunchaku.utils import get_precision
precision = get_precision() # auto-detect: "int4" or "fp4"
rank = 64
transformer = NunchakuErnieImageTransformer2DModel.from_pretrained(
f"ZJMuYun97/ERNIE-Image-Nunchaku/svdq-{precision}_r{rank}-ernie-image.safetensors",
torch_dtype=torch.bfloat16,
device="cuda",
)
pipe = ErnieImagePipeline.from_pretrained(
"baidu/ERNIE-Image-Turbo",
transformer=transformer,
torch_dtype=torch.bfloat16,
pe=None, pe_tokenizer=None,
)
image = pipe(
prompt="a cute orange cat sitting on a sunlit windowsill",
height=1024, width=1024,
num_inference_steps=8,
guidance_scale=1.0,
generator=torch.Generator().manual_seed(42),
).images[0]
image.save("ernie-image.png")
Performance (Reference)
Tested on a single A800 GPU, 1024×1024 resolution, 8 inference steps:
| Model | Avg Latency | Speedup |
|---|---|---|
| Original BF16 | 4.89s | 1.0x |
| Nunchaku W4A4 | 2.81s | 1.74x |
Notes
- Only
batch_size=1is supported (same as typical inference use case).
简介
为 Nunchaku 添加了对 ERNIE Image Turbo 的 W4A4 量化推理支持,在保持图像质量的前提下显著提升推理速度、降低显存占用。
本实现基于 Nunchaku,感谢其在高效扩散模型推理方面的出色工作。
安装
# 本 fork 基于 Nunchaku 添加了对 ERNIE Image 的支持
git clone https://github.com/Hzj199/nunchaku.git
cd nunchaku
git submodule update --init --recursive
pip install build
NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation
pip install dist/nunchaku-*.whl
快速开始
import torch
from diffusers.pipelines.ernie_image.pipeline_ernie_image import ErnieImagePipeline
from nunchaku import NunchakuErnieImageTransformer2DModel
from nunchaku.utils import get_precision
precision = get_precision() # 自动检测:int4 或 fp4
rank = 64
transformer = NunchakuErnieImageTransformer2DModel.from_pretrained(
f"ZJMuYun97/ERNIE-Image-Nunchaku/svdq-{precision}_r{rank}-ernie-image.safetensors",
torch_dtype=torch.bfloat16,
device="cuda",
)
pipe = ErnieImagePipeline.from_pretrained(
"baidu/ERNIE-Image-Turbo",
transformer=transformer,
torch_dtype=torch.bfloat16,
pe=None, pe_tokenizer=None,
)
image = pipe(
prompt="一只可爱的橘色猫咪坐在阳光照射的窗台上,旁边放着一盆绿色植物",
height=1024, width=1024,
num_inference_steps=8,
guidance_scale=1.0,
generator=torch.Generator().manual_seed(42),
).images[0]
image.save("ernie-image.png")
性能参考
A800 单卡测试,1024×1024 分辨率,8 步推理:
| 模型 | 平均延迟 | 加速比 |
|---|---|---|
| 原始 BF16 | 4.89s | 1.0x |
| Nunchaku W4A4 | 2.81s | 1.74x |
注意事项
- 仅支持
batch_size=1(符合常见推理场景)。
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support