OutOfMemory

#3
by DemonMaike - opened

Hi and thanks for your work and model.

Could you help me with one question about memory?
Now i see your model work with 32k context window, but if i use my context ~23k tokens, i have error about memory, but i work with A100 80GB.
Maybe i don't understand and your working context window less then 32k tokens or i must use special settings?

About how my using your model, i use base encode\decode methods all my context into your model, and model work good with small context.
Thanks.

The vanilla transformers backend is really inefficient with long context. It may actually be using 80GB?

You need to use a quant, or at least load it in 8 bits.

Also, the SciPhi instruct model doesn't work well past 9-10k (but works very well under that limit), so you may not want to use 32k for this iteration yet anyway.

Thanks for your answer and description of limits your model.
Of course, i will try to use 4-8bit version.
Good luck with your work πŸ’ͺ

Its not my model, though I like the idea :P

OutOfMemoryError: CUDA out of memory. Tried to allocate 50.64 GiB (GPU 0; 79.35 GiB total capacity; 13.25 GiB already allocated; 23.54 GiB free; 55.07 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF.

I have two gpus each of 80 gb memory. How do I solve this error?

Sign up or log in to comment