Running Out Of GPU Memory When Encoding

#19
by jordanparker6 - opened

I am using to following:

1x RTX 4080
16 vCPU 29 GB RAM

batch_size = 2

I am just encoding the model using llama-index's HuggingFaceEmbeddings class and I am getting the following error.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 640.00 MiB (GPU 0; 15.69 GiB total capacity; 14.77 GiB already allocated; 440.56 MiB free; 14.98 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Why is this happening? My batch size is tiny.

Jina AI org

Hi @jordanparker6 this could happens if your documents is very long. Our experimental results shows, if you have 24G vRAM, and you send 2 docs with 8192 tokens, it will run out of memory. It is the nature of to encode very large documents (because encoding much longer).

My suggestion is:

  1. using bigger gpu.
  2. consider to chunk your document into, let's say 2k/4k seq length if you do not necessarily need to encode such large documents.
  3. turn model into fp16, further reduce batch size to 1.
bwang0911 changed discussion status to closed

Sign up or log in to comment