torch.cuda.OutOfMemoryError: CUDA out of memory
I am testing several image captioning models in a SageMaker image terminal running python on a g5 instance. For instance, I was able to test Salesforce/instructblip-flan-t5-xl on a ml.g5.4xlarge using the PyTorch 2.0.0 Python 3.10 GPU Optimized image. However, when I run the Salesforce/instructblip-flan-t5-xxl model, I get the below error specific to GPU memory regardless of the instance type used. I did try setting PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024β which made no difference. Any insights on how to mitigate this error would be appreciated.
Traceback (most recent call last):
File "/root/image-caption-main/sample_smartsheet_images.py", line 13, in <module>
from model.salesforce.instructblip_flan_t5_xxl import caption_image
File "/root/image-caption-main/model/salesforce/instructblip_flan_t5_xxl.py", line 12, in <module>
model.to(device)
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2065, in to
return super().to(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 5 more times]
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 22.20 GiB total capacity; 21.33 GiB already allocated; 73.12 MiB free; 21.43 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Hi,
did you use 8 or 4-bit inference?
I used the code found on the model card: https://huggingface.co/Salesforce/instructblip-flan-t5-xxl
The model card makes no reference to 8-bit or 4-bit inference.
From what we have observed, the model is too large to fit on a single GPU. When executing, a single GPU is maxed out resulting in the above error while the remaining GPUs are not used. I am currently looking at the accelerate library as a possible solution.
Hi,
Refer to the code snippets of BLIP-2 regarding 4- and 8-bit inference: https://huggingface.co/Salesforce/blip2-opt-2.7b#running-the-model-on-gpu.
These greatly reduce the amount of memory (by a factor of 4 to 8).