12 GB GPU memory enough?

#150
by rok - opened

Does anyone else run on a 12 GB GPU? I have an RTX 3080TI with 12 GB, and I run out of VRAM (Windows) on a 512x512 image. I would be interested particularly if anyone has more success on Linux (before I get a dual boot).

The message I get is

RuntimeError: CUDA out of memory. Tried to allocate 3.00 GiB (GPU 0; 12.00 GiB total capacity; 5.64 GiB already allocated; 0 bytes free; 8.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

This seems to indicate not all 12GB are actually used. The task manager goes from ~300MB up to full prior to the crash, though

I am getting the same error. Did all the setup, got it running, and then ran into that. Really hoping there is something that can be done to get it to work.

Here are some things I tried that worked:

  • reduce the resolution. On my 12GB card, I was able to do 512x256. Use --H 256 --W 512 as arguments for txt2img.py
  • use --precision full. This saved maybe 10-15% VRAM
  • use --n_samples = 1. This saves a small amount of VRAM

Actually, using all of the above, I can barely do 512x512 with my 12GB VRAM

I'm on a linux desktop with an 8GB gpu, able to generate 512x512 pics with VRAM usage around 6GB, still able to have my browser opened along side, watching some tutorials on youtube.

Some of my experiences:

  • cuda GPUs support loading models in half precision. You can add a line of model.half() before the model.cuda() line in txt2img.py script. This saves huge on VRAM, while usually it doesn't impact image quality at all;
  • Set n_samples to 1. This parameter is misleading, it tells the GPU to generate n amounts of pics at ONE time. If you want to generate multiple pics for one prompt, use n_iter;
  • Turn off the safety check, this saves a bit of VRAM by not loading the safety model, but terms may be violated by doing this. You can find guides for this elsewhere, however;

I run a 3080Ti, 12Gb, on an SSD-based win10pro machine with 96GB RAM and a Xeon 8-core. runs great, with following settings: [ -- plms --n_iter 5 --n_samples 2 --precision full --ddim_steps 250]. VRAM usage is 11.8-12GB during run, up from about 900-mb-2GB prior to running script.

Let me tell you guys something since this is the first blog that appears when i searched this error, i solved this by simply finding some aspect ratios that stable difusion can digest, it doesn't really matter how high your resolution is if its a digestable aspect ratio, for example, i ended up again searching about this because i was runing high resolutions images just fine, then it updated and i couldn't run sxit, it was giving me the CUDA, then i especified exactly the resolution of the images i was generating before that were 1200x1920, and then it runs smoothly, in this case it was in image to image, i generate text to image lowres images then transfer that image with the same prompt to image to image and fiddle a little with the CFG and denoise until it makes my image better and at a better resolution and basically the same composition, if you produce the image you want with an example from image to image into 1200x1920 it may take 20 minutes but it will have more pixels to play and a guide, making a good result... bellow you can see the images and my configs used in img2img, remember it only says you are out of memory if you use weird aspect ratio like if you go different by 1 digit from something acceptable it will give CUDA, for now i am using, 1200x1920, i dont know any other values that work

00020-2216704537.png
00286-1795622451.png
perfeição no upscale.png
00020-2216704537.png

Sign up or log in to comment