CUDA out of memory

#2
by ironharvy - opened

The example provided throws 'CUDA out of memory' error if image for upscale is more then 128x128 (256x 256 for example). I'm running this on 4090 with 24Gb of VRAM:
==code==
import requests
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionUpscalePipeline
import torch

load model and scheduler

model_id = "stabilityai/stable-diffusion-x4-upscaler"
pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16)
pipeline = pipeline.to("cuda")

let's download an image

url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/low_res_cat.png"
response = requests.get(url)
low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
low_res_img = low_res_img.resize((128, 128)) # <==========================================================

prompt = "a white cat"

upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0]
upscaled_image.save("upsampled_cat.png")
==end of code==

$ python scripts/gradio/sr_cli.py /mnt/c/Users/User/Downloads/samples/00008.png "a professional photograph of an astronaut riding a horse"
Fetching 14 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 14/14 [00:00<00:00, 55136.39it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 75/75 [05:15<00:00, 4.20s/it]
Traceback (most recent call last):
File "/home/lameradze/stablediffusion/scripts/gradio/sr_cli.py", line 29, in
upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0]
File "/home/lameradze/miniconda3/envs/sd/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/lameradze/miniconda3/envs/sd/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py", line 494, in call
image = self.decode_latents(latents.float())
File "/home/lameradze/miniconda3/envs/sd/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py", line 258, in decode_latents
image = self.vae.decode(latents).sample
File "/home/lameradze/miniconda3/envs/sd/lib/python3.9/site-packages/diffusers/models/vae.py", line 605, in decode
decoded = self._decode(z).sample
File "/home/lameradze/miniconda3/envs/sd/lib/python3.9/site-packages/diffusers/models/vae.py", line 577, in _decode
dec = self.decoder(z)
File "/home/lameradze/miniconda3/envs/sd/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lameradze/miniconda3/envs/sd/lib/python3.9/site-packages/diffusers/models/vae.py", line 213, in forward
sample = self.mid_block(sample)
File "/home/lameradze/miniconda3/envs/sd/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lameradze/miniconda3/envs/sd/lib/python3.9/site-packages/diffusers/models/unet_2d_blocks.py", line 312, in forward
hidden_states = attn(hidden_states)
File "/home/lameradze/miniconda3/envs/sd/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lameradze/miniconda3/envs/sd/lib/python3.9/site-packages/diffusers/models/attention.py", line 350, in forward
attention_scores = torch.baddbmm(
RuntimeError: CUDA out of memory. Tried to allocate 16.00 GiB (GPU 0; 23.99 GiB total capacity; 18.49 GiB already allocated; 1.06 GiB free; 20.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I am getting the same issue, I can't upscale 256x256 image (supposed to be 1024x1024 after scale) in a g4dn.xlarge instance
Is there a hardware requirement for this model documented?

pretty useless right now. nobody has 1TiB of VRAM.

Updated the error message for the 256x256 error (the previous one was actually for 768x768)

Try installing xformers from https://github.com/facebookresearch/xformers. Once you confirm that works, then after you create the upscale pipeline and move it to the GPU, add the command pipeline.set_use_memory_efficient_attention_xformers(True). That should let you upscale a 512x512 image on a 24 gb GPU.

Try installing xformers from https://github.com/facebookresearch/xformers. Once you confirm that works, then after you create the upscale pipeline and move it to the GPU, add the command pipeline.set_use_memory_efficient_attention_xformers(True). That should let you upscale a 512x512 image on a 24 gb GPU.

Could you show the xformers output? This is mine

A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xFormers 0.0.15.dev+3ea7307.d20221209
memory_efficient_attention.flshatt:      available - requires GPU with compute capability 7.5+
memory_efficient_attention.cutlass:      available
memory_efficient_attention.small_k:      available
memory_efficient_attention.tritonflashatt: not built
memory_efficient_attention.ftriton_bflsh: not built
swiglu.fused.p.cpp:                      available
is_triton_available:                     False
is_functorch_available:                  False
pytorch.version:                         1.12.1
pytorch.cuda:                            available
gpu.compute_capability:                  7.5
gpu.name:                                Tesla T4

Besides, according to the docs, the correct syntax must be pipeline.enable_xformers_memory_efficient_attention, but it doesn't help at all

Looks like this did the trick, thank you @mgobb

$ python -m xformers.info
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xFormers 0.0.15.dev+4c06c79.d20221206
memory_efficient_attention.flshatt:      available - requires GPU with compute capability 7.5+
memory_efficient_attention.cutlass:      available
memory_efficient_attention.small_k:      available
swiglu.fused.p.cpp:                      available
is_triton_available:                     False
is_functorch_available:                  False
pytorch.version:                         1.12.1
pytorch.cuda:                            available
gpu.compute_capability:                  8.9
gpu.name:                                NVIDIA GeForce RTX 4090

So I installed 'triton'

A matching Triton is not available, some optimizations will not be enabled.
Error caught was: module 'triton.language' has no attribute 'constexpr'
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: module 'triton.language' has no attribute 'constexpr'
xFormers 0.0.15.dev+4c06c79.d20221206
memory_efficient_attention.flshatt:      available - requires GPU with compute capability 7.5+
memory_efficient_attention.cutlass:      available
memory_efficient_attention.small_k:      available
swiglu.fused.p.cpp:                      available
is_triton_available:                     False
is_functorch_available:                  False
pytorch.version:                         1.12.1
pytorch.cuda:                            available
gpu.compute_capability:                  8.9
gpu.name:                                NVIDIA GeForce RTX 4090

after checking online looks like the solution for the triton issue is to install a newer version as written in here https://github.com/openai/triton/issues/625

$ pip install triton==2.0.0.dev20221120
...
$ python -m xformers.info
xFormers 0.0.15.dev+4c06c79.d20221206
memory_efficient_attention.flshatt:      available - requires GPU with compute capability 7.5+
memory_efficient_attention.cutlass:      available
memory_efficient_attention.small_k:      available
swiglu.fused.p.cpp:                      available
is_triton_available:                     True
is_functorch_available:                  False
pytorch.version:                         1.12.1
pytorch.cuda:                            available
gpu.compute_capability:                  8.9
gpu.name:                                NVIDIA GeForce RTX 4090

I also added pipeline.set_use_memory_efficient_attention_xformers(True) to the script and tried running it with 512x512 and got 2048x2048 image

ironharvy changed discussion status to closed

Thanks I installed triton, einops, pyre-extensions to resolve the triton error.
pip install triton==2.0.0.dev20221120
pip install pyre-extensions==0.0.23
pip install einops

I am getting the same output as @ironharvy 's above.

xFormers 0.0.15.dev395+git.7e05e2c
memory_efficient_attention.cutlassF:               available
memory_efficient_attention.cutlassB:               available
memory_efficient_attention.flshattF:               available
memory_efficient_attention.flshattB:               available
memory_efficient_attention.smallkF:                available
memory_efficient_attention.smallkB:                available
memory_efficient_attention.tritonflashattF:        available
memory_efficient_attention.tritonflashattB:        available
swiglu.fused.p.cpp:                                available
is_triton_available:                               True
is_functorch_available:                            False
pytorch.version:                                   1.13.0
pytorch.cuda:                                      available
gpu.compute_capability:                            7.5
gpu.name:                                          NVIDIA GeForce RTX 2080 Ti

However, even in my case, is_functorch_available is set to False. Is it possible to resolve this as well?

Unable to install triton

C:\Users\User>pip install triton==2.0.0.dev20221120
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
ERROR: Could not find a version that satisfies the requirement triton==2.0.0.dev20221120 (from versions: none)
ERROR: No matching distribution found for triton==2.0.0.dev20221120

Which python version are you using?

python -V
Python 3.9.15

pip --version
pip 22.3.1

Python 3.10.8
pip 22.2.2

Sign up or log in to comment