torch.cuda.OutOfMemoryError: CUDA out of memory

#1
by asterix51 - opened

Yep, apparently RTX 4080 16GB and 32gigs DRAM won't cut it.

"torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 40.00 MiB. GPU 0 has a total capacty of 15.99 GiB of which 0 bytes is free. Of the allocated memory 30.20 GiB is allocated by PyTorch, and 56.35 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"

DreamGen org

Hi there, can you share more information so that I can assist you?

  1. Which version of the model are you trying to run (this one, or some quant?)
  2. How are you running the model (what is the program and command)

This is a 70B parameter model, so it's not possible to run the full precision model on any single graphics card (even the H100 :)). The AWQ quant dreamgen/opus-v0-70b-awq takes at least 35G (I am running it on a 48GB card).

You could try a few things:

  1. Run the 7B model instead dreamgen/opus-v0-7b -- you can also try the GGUF Q8 and Q6 quants, which are not bad (the smaller ones are quite bad)
  2. Run the 70B quantized model on CPU -- this is going to be very slow, and I have not tried this myself
  3. Rent a GPU in the cloud
  4. Try it on dreamgen.com -- you can try both the 7B and the 70B AWQ versions

Tried v0-70b-awq version with text-gen-web-ui. I can't use dreamgen service either since I can't comply with their data collection policy as my work is protected.

DreamGen org

Thanks for the details, the 70B-AWQ requires at least 35G of VRAM, so you can't run it with your GPU. You can try this quantized version, the smallest, from TheBloke: https://huggingface.co/TheBloke/opus-v0-70B-GGUF/blob/main/opus-v0-70b.Q2_K.gguf you could then try running this with llama.cpp backend on your CPU (or offload some work to GPU as well), but expect it to be slow.

Otherwise, you could try running on some GPU cloud, I use RunPod. To run the 70B AWQ model, I use A6000 with the vLLM backend.

Thanks for the tips. I'll give it a go.

Sign up or log in to comment