Salesforce/instructblip-flan-t5-xxl · torch.cuda.OutOfMemoryError: CUDA out of memory

Sep 27, 2023

I am testing several image captioning models in a SageMaker image terminal running python on a g5 instance. For instance, I was able to test Salesforce/instructblip-flan-t5-xl on a ml.g5.4xlarge using the PyTorch 2.0.0 Python 3.10 GPU Optimized image. However, when I run the Salesforce/instructblip-flan-t5-xxl model, I get the below error specific to GPU memory regardless of the instance type used. I did try setting PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024 which made no difference. Any insights on how to mitigate this error would be appreciated.

Traceback (most recent call last):
  File "/root/image-caption-main/sample_smartsheet_images.py", line 13, in <module>
    from model.salesforce.instructblip_flan_t5_xxl import caption_image
  File "/root/image-caption-main/model/salesforce/instructblip_flan_t5_xxl.py", line 12, in <module>
    model.to(device)
  File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2065, in to
    return super().to(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 5 more times]
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 22.20 GiB total capacity; 21.33 GiB already allocated; 73.12 MiB free; 21.43 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

nielsr

Sep 28, 2023

Hi,

did you use 8 or 4-bit inference?

richardbasile

Oct 2, 2023

I used the code found on the model card: https://huggingface.co/Salesforce/instructblip-flan-t5-xxl

The model card makes no reference to 8-bit or 4-bit inference.

From what we have observed, the model is too large to fit on a single GPU. When executing, a single GPU is maxed out resulting in the above error while the remaining GPUs are not used. I am currently looking at the accelerate library as a possible solution.

nielsr

Oct 4, 2023

Hi,

Refer to the code snippets of BLIP-2 regarding 4- and 8-bit inference: https://huggingface.co/Salesforce/blip2-opt-2.7b#running-the-model-on-gpu.

These greatly reduce the amount of memory (by a factor of 4 to 8).