Performance reduction from using 8bit or 4bit quantized model

#58
by michaelomahony - opened

I am using the 8bit quantized model implemented with bitsandbytes. Does anybody know how much of a performance reduction is expected from using these versions of the model?

How about from using float16 vs bfloat16?

Sign up or log in to comment