Diffusers documentation

T-GATE

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.35.1).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

T-GATE

T-GATE 通过跳过交叉注意力计算一旦收敛,加速了 Stable DiffusionPixArtLatency Consistency Model 管道的推理。此方法不需要任何额外训练,可以将推理速度提高 10-50%。T-GATE 还与 DeepCache 等其他优化方法兼容。

开始之前,请确保安装 T-GATE。

pip install tgate
pip install -U torch diffusers transformers accelerate DeepCache

要使用 T-GATE 与管道,您需要使用其对应的加载器。

管道 T-GATE 加载器
PixArt TgatePixArtLoader
Stable Diffusion XL TgateSDXLLoader
Stable Diffusion XL + DeepCache TgateSDXLDeepCacheLoader
Stable Diffusion TgateSDLoader
Stable Diffusion + DeepCache TgateSDDeepCacheLoader

接下来,创建一个 TgateLoader,包含管道、门限步骤(停止计算交叉注意力的时间步)和推理步骤数。然后在管道上调用 tgate 方法,提供提示、门限步骤和推理步骤数。

让我们看看如何为几个不同的管道启用此功能。

PixArt
Stable Diffusion XL
StableDiffusionXL with DeepCache
Latent Consistency Model

使用 T-GATE 加速 PixArtAlphaPipeline

import torch
from diffusers import PixArtAlphaPipeline
from tgate import TgatePixArtLoader

pipe = PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-1024-MS", torch_dtype=torch.float16)

gate_step = 8
inference_step = 25
pipe = TgatePixArtLoader(
       pipe,
       gate_step=gate_step,
       num_inference_steps=inference_step,
).to("cuda")

image = pipe.tgate(
       "An alpaca made of colorful building blocks, cyberpunk.",
       gate_step=gate_step,
       num_inference_steps=inference_step,
).images[0]

T-GATE 还支持 StableDiffusionPipelinePixArt-alpha/PixArt-LCM-XL-2-1024-MS

基准测试

模型 MACs 参数 延迟 零样本 10K-FID on MS-COCO
SD-1.5 16.938T 859.520M 7.032s 23.927
SD-1.5 w/ T-GATE 9.875T 815.557M 4.313s 20.789
SD-2.1 38.041T 865.785M 16.121s 22.609
SD-2.1 w/ T-GATE 22.208T 815.433 M 9.878s 19.940
SD-XL 149.438T 2.570B 53.187s 24.628
SD-XL w/ T-GATE 84.438T 2.024B 27.932s 22.738
Pixart-Alpha 107.031T 611.350M 61.502s 38.669
Pixart-Alpha w/ T-GATE 65.318T 462.585M 37.867s 35.825
DeepCache (SD-XL) 57.888T - 19.931s 23.755
DeepCache 配合 T-GATE 43.868T - 14.666秒 23.999
LCM (SD-XL) 11.955T 2.570B 3.805秒 25.044
LCM 配合 T-GATE 11.171T 2.024B 3.533秒 25.028
LCM (Pixart-Alpha) 8.563T 611.350M 4.733秒 36.086
LCM 配合 T-GATE 7.623T 462.585M 4.543秒 37.048

延迟测试基于 NVIDIA 1080TI,MACs 和 Params 使用 calflops 计算,FID 使用 PytorchFID 计算。

< > Update on GitHub