RuntimeError: cutlassF: no kernel found to launch!

#11
by mayonaisu - opened

When running the example code on Google colab with T4 GPU , it is throwing RuntimeError: cutlassF: no kernel found to launch! for the prior inference step.

Try to add

torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)

Adding those 2 lines fixed it for me.
I opened a PR a few minutes before...

torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)

Working, but now I am just running out of RAM before the decoder

try to reduce num_images_per_prompt to 1

Nope, not enough RAM for this ig

I am actually able to do this on a kaggle notebook with the dual T4 GPU and double the RAM size as this, but the images generated are undercooked

image.png

It works in a T4 colab for me when I install accelerate.
Example Colab here: https://colab.research.google.com/drive/1qV14_OzZDNx6G-Lx2NE2Imk_7dfDbwkm?usp=sharing

It works in a T4 colab for me when I install accelerate.
Example Colab here: https://colab.research.google.com/drive/1qV14_OzZDNx6G-Lx2NE2Imk_7dfDbwkm?usp=sharing

T4 not work for me
OutOfMemoryError: CUDA out of memory. Tried to allocate 40.00 MiB.

Adding those 2 lines fixed it for me.
I opened a PR a few minutes before...

torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)

Where is this supposed to go :P ?

where to add this two line
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)

Please have a look in the example colab referenced above

Sign up or log in to comment