Optimizations planned?

#5
by tintwotin - opened

Uh, this is a big one. 35 GB VRAM. Generating a 1024x1024 on a RTX 4090 takes almost 20 minutes. And it seems to be unhappy with non square ratios? (1024x576)

So are any optimizations planed like pf16/8/4 or a pruned version, so it would be possible to run this on high-end consumer cards?

On my machine, default workflow, single image, 25 steps, on a 4090, uses 16.9GB VRAM in ComfyUI, takes 18 seconds: 25/25 [00:18<00:00, 1.34it/s]

Uh, this is a big one. 35 GB VRAM. Generating a 1024x1024 on a RTX 4090 takes almost 20 minutes. And it seems to be unhappy with non square ratios? (1024x576)

So are any optimizations planed like pf16/8/4 or a pruned version, so it would be possible to run this on high-end consumer cards?

Have you checked if it's using the GPU? 20 minutes looks like CPU inference.

I tried using GPT to describe the graphics, and the resulting graphics were great,😁 (unlike the tag-like prompt of the SD1.5 XL) very similar to "ideograms", which was coolπŸ‘
Will there be an accelerated version of this model? (Similar to SD-TURBO, TCL,LIGHT)😦

Can it run on mainstream(-ish) 16 GB VRAM cards, or there is no point to download this model without something like RTX 4090 or A100?

Can it run on mainstream(-ish) 16 GB VRAM cards, or there is no point to download this model without something like RTX 4090 or A100?

I tried run on 16G VRAM, but the safetensors model is already over 16G, It runs so slowly on my machine like 3s/it.

I use RTX4090 can keep 50/50 [00:35<00:00, 1.42it/s], but when I want to use 15 GB vram GPU in GPU clouds, it always happens that out of gpu memorys. Even if SD3 medium also can support 8G vram.

Maybe some TensorRT optimization will help?

fal org

We are are working on smaller versions of the model. Stay tuned for updates! Thanks all for chiming in!

burkaygur changed discussion status to closed

For AuraFlow0.3 are there an TensorRT's planned at all for ComfyUI?

Sign up or log in to comment