Out of memory error, but both system and GPU have plenty of memory

#37
by mstachow - opened

I'm trying to run this model using auto_gptq and huggingface. Other gptq models, for instance 13B models, load just fine. However, whenever I try to load this model I get weird errors. Specifically:

RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 117440512 bytes.

There are several issues here:

  1. I have 128GB of RAM, and when this error happened only about 58GB were allocated. That number looks to be about 117 MB, so I for sure have enough RAM
  2. I have an RTX A6000 with 48GB of RAM. Using 4-bit quantization with this model I should need only 35GB or so of VRAM

Does anyone know what's going on?

Hmm. That is odd. That error does mean you don't have enough RAM, but you have plenty. And yes you have enough VRAM, too.

What OS? All I can think is that you're running on Windows? If so, then this is an annoyance with Windows: regardless of how much RAM you have, you also need a large pagefile. I'd set it to 100GB minimum. That often affects people trying to load 33B models, and would apply even more with 70B.

If you're on Linux then I don't know; it should work.

You were entirely correct, thank you! I am on Windows, and I had previously set my page files to 16GB, but I didn't realize they had to be THAT big. Also I have two drives, so I set the page files on both to be 100GB min, 200GB max. Whether I needed to do that for both drives I'm not sure, but it works now and is happily running inference.

Same thing I trying to run synthia-70b-v1.1.Q8_0.gguf in Obaboga WEB UI and it drops loading the model with MemoryError in console after memory usage hits 77GB. (In native Llama.cpp the model loads without problem and consumes 75GB) I have 128GB and it looks like I just need 1-3GB more to cache this model even tho it only consumed 77GB (the model wants to cache 78-80GB) but because of how this shitty Windows OS works I can't load the model with still plenty of ram available. Right now I'm trying to squeeze additional 3GB from the OS by disabling some services, defender.. and etc. The Windows system just ridiculous. P.S. and my pagefile is disabled (for better speed and etc) and I don't want to have it in my current PC. The barebone W11 system consumes 3GB of ram in Safe Mode with 128GB installed but in normal mode (with some drivers and crucial launchers installed for GPU and etc) it consumes around 8GB with 128GB installed I managed to lower it to 6GB so far but I need to lower somehow on 2-3GB more. Windows 11 is memory hungry system in comparison to Windows XP OS that just consumes something like 128-256MB out of the box.
Or I might just wait for Obaboga WEB UI updates because I can load the model just fine in Llama.cpp.

Update: Updated today Obaboga WEB UI to the latest version and finally now the model loads perfectly fine.

Hmm. That is odd. That error does mean you don't have enough RAM, but you have plenty. And yes you have enough VRAM, too.

What OS? All I can think is that you're running on Windows? If so, then this is an annoyance with Windows: regardless of how much RAM you have, you also need a large pagefile. I'd set it to 100GB minimum. That often affects people trying to load 33B models, and would apply even more with 70B.

If you're on Linux then I don't know; it should work.

Thank you for this, I recently got an A6000 and was trying to load 70b and was running into the same issue. Changing the paging file corrected the issue right away. Took me a while to find the answer. Maybe mention on the pages about the paging file. First time I came across this after a few days of searching.

Hmm. That is odd. That error does mean you don't have enough RAM, but you have plenty. And yes you have enough VRAM, too.

What OS? All I can think is that you're running on Windows? If so, then this is an annoyance with Windows: regardless of how much RAM you have, you also need a large pagefile. I'd set it to 100GB minimum. That often affects people trying to load 33B models, and would apply even more with 70B.

If you're on Linux then I don't know; it should work.

Thank you for this, I recently got an A6000 and was trying to load 70b and was running into the same issue. Changing the paging file corrected the issue right away. Took me a while to find the answer. Maybe mention on the pages about the paging file. First time I came across this after a few days of searching.

Actually If you have enough ram you don't need a pagefile I have 128GB of ram right now and my pagefile is completely disabled. I run 70B models without ram issues after I updated my Obaboga WEB UI.

Sign up or log in to comment