CUDA Out of memory

#3
by Afster - opened

.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 8.00 GiB total capacity; 7.08 GiB already allocated; 0 bytes free; 7.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Getting this issue. Tried to change the pre-layer within Oooba-booga but no success.

I've been able to get responses on an rtx 2060 super 8gb card with the following flags in ooba

call python server.py --auto-devices --extensions api --model notstoic_pygmalion-13b-4bit-128g --model_type LLaMA --wbits 4 --groupsize 128 --no-cache --pre_layer 30

I've been able to get responses on an rtx 2060 super 8gb card with the following flags in ooba

call python server.py --auto-devices --extensions api --model notstoic_pygmalion-13b-4bit-128g --model_type LLaMA --wbits 4 --groupsize 128 --no-cache --pre_layer 30

Not working for me

Afster changed discussion status to closed
Afster changed discussion status to open

Sign up or log in to comment