Cuda Out of Memory Error
I'm currently trying to use the orca mini 3b and getting the Cuda out of memory error. The error line says the following:
OutOfMemoryError: CUDA out of memory. Tried to allocate 392.00 MiB (GPU 0; 8.00 GiB total capacity; 6.65 GiB already allocated; 0 bytes free; 7.05 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
This is strange in my case as I was easily able to use the model a while ago.
This is the cell where I'm importing the model:
This is the standard generate text function I'm using with the default prompt to write a letter:
Instead of an output I get the error:
Usually when people get errors like this a common solution is to reduce the batch size. I'm not exactly sure how to reduce the batch size here as I don't see any parameters regarding it. I am also new to using huggingface/transformers so if anyone has any suggestions, it would be much appreciated.
Thanks for detail post and screenshot, it’s look like your current machine which is shown is screenshot has only 8GB VRAM, so I am not sure you can directly use this model and code from the repo data card. You should try quantized version provided by “TheBloke” on his HF Repo and follow this detail post on how to setup local webui to play around with quantized orca-minis or any other quantized HF models.
https://www.reddit.com/r/LocalLLaMA/wiki/guide?utm_source=share&utm_medium=ios_app&utm_name=ioscss&utm_content=1&utm_term=1