T-GATE

T-GATE accelerates inference for Stable Diffusion, PixArt, and Latency Consistency Model pipelines by skipping the cross-attention calculation once it converges. This method doesn’t require any additional training and it can speed up inference from 10-50%. T-GATE is also compatible with other optimization methods like DeepCache.

Before you begin, make sure you install T-GATE.

pip install tgate
pip install -U torch diffusers transformers accelerate DeepCache

To use T-GATE with a pipeline, you need to use its corresponding loader.

Pipeline	T-GATE Loader
PixArt	TgatePixArtLoader
Stable Diffusion XL	TgateSDXLLoader
Stable Diffusion XL + DeepCache	TgateSDXLDeepCacheLoader
Stable Diffusion	TgateSDLoader
Stable Diffusion + DeepCache	TgateSDDeepCacheLoader

Next, create a TgateLoader with a pipeline, the gate step (the time step to stop calculating the cross attention), and the number of inference steps. Then call the tgate method on the pipeline with a prompt, gate step, and the number of inference steps.

Let’s see how to enable this for several different pipelines.

PixArt

Stable Diffusion XL

StableDiffusionXL with DeepCache

Latent Consistency Model

T-GATE also supports StableDiffusionPipeline and PixArt-alpha/PixArt-LCM-XL-2-1024-MS.

Benchmarks

Model	MACs	Param	Latency	Zero-shot 10K-FID on MS-COCO
SD-1.5	16.938T	859.520M	7.032s	23.927
SD-1.5 w/ T-GATE	9.875T	815.557M	4.313s	20.789
SD-2.1	38.041T	865.785M	16.121s	22.609
SD-2.1 w/ T-GATE	22.208T	815.433 M	9.878s	19.940
SD-XL	149.438T	2.570B	53.187s	24.628
SD-XL w/ T-GATE	84.438T	2.024B	27.932s	22.738
Pixart-Alpha	107.031T	611.350M	61.502s	38.669
Pixart-Alpha w/ T-GATE	65.318T	462.585M	37.867s	35.825
DeepCache (SD-XL)	57.888T	-	19.931s	23.755
DeepCache w/ T-GATE	43.868T	-	14.666s	23.999
LCM (SD-XL)	11.955T	2.570B	3.805s	25.044
LCM w/ T-GATE	11.171T	2.024B	3.533s	25.028
LCM (Pixart-Alpha)	8.563T	611.350M	4.733s	36.086
LCM w/ T-GATE	7.623T	462.585M	4.543s	37.048

The latency is tested on an NVIDIA 1080TI, MACs and Params are calculated with calflops, and the FID is calculated with PytorchFID.

Update on GitHub

Diffusers

T-GATE

Benchmarks