Running Out Of GPU Memory When Encoding
#19
by
jordanparker6
- opened
I am using to following:
1x RTX 4080
16 vCPU 29 GB RAM
batch_size = 2
I am just encoding the model using llama-index's HuggingFaceEmbeddings class and I am getting the following error.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 640.00 MiB (GPU 0; 15.69 GiB total capacity; 14.77 GiB already allocated; 440.56 MiB free; 14.98 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Why is this happening? My batch size is tiny.
Hi @jordanparker6 this could happens if your documents is very long. Our experimental results shows, if you have 24G vRAM, and you send 2 docs with 8192 tokens, it will run out of memory. It is the nature of to encode very large documents (because encoding much longer).
My suggestion is:
- using bigger gpu.
- consider to chunk your document into, let's say 2k/4k seq length if you do not necessarily need to encode such large documents.
- turn model into fp16, further reduce batch size to 1.
bwang0911
changed discussion status to
closed