How could we load the model with low gpu memory?

My GPU memory is 24GB, which is not enough for the model. How could we load the model with low GPU memory?


You can pass a quantization_config to the from_pretrained method in order for it to load in fewer bytes (like 4 bit or 8 bit):

from transformers import BitsAndBytesConfig, InstructBlipForConditionalGeneration

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

model = InstructBlipForConditionalGeneration.from_pretrained("Salesforce/instructblip-vicuna-13b", device_map="auto", quantization_config=quantization_config)

Refer to the blog post for details:

Thank you so much for your help!

