Why is this exactly the same size as the 8-bit one?

#1
by dagelf - opened

I'm guessing there is a mistake...?

LLM-Quantization org

In our research, to quickly validate the effectiveness of various quantization methods, we only performed fake-quant on smoothquant without storing in real 4-bit format. Therefore, the size of the checkpoints we saved and uploaded is actually equivalent to the fp16 model.πŸͺ„
We will continue to improve our work to achieve as realistic quantization testing as possible with software and hardware support. More work is on the way!πŸ€—

That makes sense, I guess I could've looked :-D Thank you for the clarification!

dagelf changed discussion status to closed

Sign up or log in to comment