RuntimeError... not enough memory???

#47
by Keyloggeduser - opened

Hey,

Just went this guide on installing this and running the first example I got a memory error. Anyone else run into this?

Posting entire terminal log --->

(base) C:\Users\User>cd C:\stable-diffusion\stable-diffusion-main

(base) C:\stable-diffusion\stable-diffusion-main>conda activate ldm

(ldm) C:\stable-diffusion\stable-diffusion-main>python scripts/txt2img.py --prompt "a close-up portrait of a cat by pablo picasso, vivid, abstract art, colorful, vibrant" --plms --n_iter 5 --n_samples 1
Global seed set to 42
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Traceback (most recent call last):
File "scripts/txt2img.py", line 344, in
main()
File "scripts/txt2img.py", line 240, in main
model = load_model_from_config(config, f"{opt.ckpt}")
File "scripts/txt2img.py", line 50, in load_model_from_config
pl_sd = torch.load(ckpt, map_location="cpu")
File "C:\Users\User.conda\envs\ldm\lib\site-packages\torch\serialization.py", line 712, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "C:\Users\User.conda\envs\ldm\lib\site-packages\torch\serialization.py", line 1046, in _load
result = unpickler.load()
File "C:\Users\User.conda\envs\ldm\lib\site-packages\torch\serialization.py", line 1016, in persistent_load
load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "C:\Users\User.conda\envs\ldm\lib\site-packages\torch\serialization.py", line 997, in load_tensor
storage = zip_file.get_storage_from_record(name, numel, torch._UntypedStorage).storage()._untyped()
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 2359296 bytes.

Any help would be greatly appreciated,
Thanks,
-KLU

  1. Maybe you don't have enough graphics memory? Try to enable attention slicing to be able to run the model with less memory.
  2. Enable torch memory splitting: PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:100 python myscript.py

If your graphics card doesn't have enough memory you can run the model on CPU. It will be much slower, but at least it may fit into your CPU RAM.

I wish someone would just say in plain English where exactly and how to set an environment variable to to the above "PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:100"

A lot of sh1te on every web search about it but nobody saying it.

Two options:

export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:100
python myscript.py

Or just for the script (you just write it before the command):

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:100 python myscript.py

I think you can go down to 21 MB, and I don't know what the trade-off is, but probably you may get performance issues when memory is fragmented instead of allocated in a block.

Bash syntax. If you use powershell or similar it will look different, then search how it is done there. Or install a bash, maybe from cygwin or unxutils.

as an alternative you can try https://github.com/basujindal/stable-diffusion, it's a fork of stable diffusion for GPU with less than 10 GB memory. The nice thing about this fork is all the edit is in optimizedSD folder so you can just copy this folder to your main repo

Current diffusers has some of the new optimizations as well. You only need to add enable_attention_slicing() after loading the model and (optionally) load the fp16 model.
You will probably still need about 6 GB of VRAM, but it is much less than before. And it doesn't seem to scale linearly, I get 512^2 into 6 GB and 1024^2 into 12 GB.

I had already tried using export on the "Anaconda Prompt (Miniconda3)" console I was told to use to run the python script

python scripts/txt2img.py

But I just get:

'export' is not recognized as an internal or external command,
operable program or batch file.

Anyway I'm not sure if that's just a bad hack or workaround which slows things down massively (4x slower?)

I though my problem was I was using the big 32-bit weights by using the 7GB sd-v1-4-full-ema.ckpt file... so I tried to use the 16-bit weights in 4GB sd-v1-4.ckpt instead which I read somewhere is what you should do if memory issues... but when I use that I get SAME memory problem???

RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 8.00 GiB total capacity; 6.13 GiB already allocated; 0 bytes free; 6.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ps.. for anyone looking... https://pytorch.org/docs/stable/notes/cuda.html#memory-management

FYI - Using original fork for txt2img GPU memory usage on my 8GB 3070 when it fails with out of memory...

python scripts/txt2img.py --prompt "a close-up portrait of a cat by pablo picasso, vivid, abstract art, colorful, vibrant" --plms --n_iter 5 --n_samples 1

image.png

Using the fork above for same... it finishes without error on 3070

python optimizedSD/optimized_txt2img.py --prompt "Cyberpunk style image of a Tesla car reflection in rain" --H 512 --W 512 --seed 27 --n_iter 2 --n_samples 5 --ddim_steps 50

image.png

I don't know why arguments are different though... can't use origianl command line... says "--plms" is invalid argument

Sign up or log in to comment