OOM error: GPU memory requirements suddenly increase

#14
by Boman - opened

Hi, dear developer!

I'm using instructor-xl for embedding inference and have encountered some issues.

I have millions of data with lengths ranging from 10 to 1000 tokens (using the instructor-large tokenizer). I'm performing inference on two 3090Ti (24GB each) with a batch size of 128, which just fits the model and data into the GPU memory. When monitoring the GPU memory usage with nvtop, it reaching around 23.7GB out of 23.98GB, which seems work.

However, after running for a while, I encounter out-of-memory (OOM) errors:

File "/home/be/miniconda3/envs/amar/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 600, in forward
    attention_output = self.SelfAttention(
  File "/home/be/miniconda3/envs/amar/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/be/miniconda3/envs/amar/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 560, in forward
    attn_weights = nn.functional.softmax(scores.float(), dim=-1).type_as(
  File "/home/be/miniconda3/envs/amar/lib/python3.10/site-packages/torch/nn/functional.py", line 1843, in softmax
    ret = input.softmax(dim)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 23.68 GiB total capacity; 16.30 GiB already allocated; 3.47 GiB free; 19.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I'm not sure why the GPU memory requirements suddenly increase.

I suspect that it might be due to data longer than the model's max_length (which is set to the default value of 512) getting truncated into multiple entries, resulting in individual batch sizes larger than 128 and leading to OOM errors.
Later, I strictly limited the input data to a length of 512 tokens, but still encountered the same OOM errors.
I have no more idea about the possible reasons.

I would appreciate some tips or help regarding this matter.

Thank you !

Boman changed discussion title from OOM erroe: GPU memory requirements suddenly increase to OOM error: GPU memory requirements suddenly increase
NLP Group of The University of Hong Kong org

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

You might try to reduce the batch size in the inference, because some data sequences may be shorter, which fits into the 24GB memory. For other longer sequences, they may require more memory for a single batch.

Feel free to add any further questions or comments!

This is exactly how I'm currently addressing this issue. I reduced the batch size by 20% to limit its memory usage to only 85%. However, I later noticed that the memory usage of some individual GPUs reached around 95%.

But I'm curious because I strictly limit the length of all my sequences, and no single sequence can exceed the max length. Therefore, I shouldn't be experiencing the phenomenon you mentioned about "For other longer sequences, they may require more memory for a single batch."

Thank you

NLP Group of The University of Hong Kong org

By longer sequences, I mean those with lengths approximately the maximum length, while for other sequences, they may have lengths smaller than the maximum length, e.g., 200 or 300.

Sign up or log in to comment