Running Out Of GPU Memory When Encoding

#19

by jordanparker6 - opened Nov 2, 2023

Discussion

jordanparker6

Nov 2, 2023

I am using to following:

1x RTX 4080
16 vCPU 29 GB RAM

batch_size = 2

I am just encoding the model using llama-index's HuggingFaceEmbeddings class and I am getting the following error.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 640.00 MiB (GPU 0; 15.69 GiB total capacity; 14.77 GiB already allocated; 440.56 MiB free; 14.98 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Why is this happening? My batch size is tiny.

bwang0911

Jina AI org Nov 9, 2023

Hi @jordanparker6 this could happens if your documents is very long. Our experimental results shows, if you have 24G vRAM, and you send 2 docs with 8192 tokens, it will run out of memory. It is the nature of to encode very large documents (because encoding much longer).

My suggestion is:

using bigger gpu.
consider to chunk your document into, let's say 2k/4k seq length if you do not necessarily need to encode such large documents.
turn model into fp16, further reduce batch size to 1.

bwang0911 changed discussion status to closed Nov 19, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment