How could we load the model with low gpu memory?

#4
by erjiaxiao - opened

My GPU memory is 24GB, which is not enough for the model. How could we load the model with low GPU memory?

Hi,

You can pass a quantization_config to the from_pretrained method in order for it to load in fewer bytes (like 4 bit or 8 bit):

from transformers import BitsAndBytesConfig, InstructBlipForConditionalGeneration

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

model = InstructBlipForConditionalGeneration.from_pretrained("Salesforce/instructblip-vicuna-13b", device_map="auto", quantization_config=quantization_config)

Refer to the blog post for details: https://huggingface.co/blog/4bit-transformers-bitsandbytes

Thank you so much for your help!

Sign up or log in to comment