Will this work on an RTX 3080 10gb?

#8
by TheAIGuyz - opened

I am new to running AI on local machines

I have the 10gb model of the RTX 3080

I see the .bin files add up to around 30gb

Will I still be able to use this model?

With cpu offloading in the textgen webui it should work fine. If you quantize it to 4bit you should be able to fit the whole thing on your gpu. The reason the model is so big is because it's saved in 32bit, it will only be run in 16bit at most for inference.

I tested load_in_8bit=True, but it seems it spells out only nonsenses. It would be great if we could figure out how to do int8 quantization on this, it will make things even faster.
But it will fit on 10GB 3080, once you use that flag. The current memory consumption is around 14GB with fp16/bf16, and with int8 it will be cut in half.

Sign up or log in to comment