How to convert 4bit model back to fp16 data format?

#52
by tremblingbrain - opened

May I ask how to convert this 4bit model back to fp16/fp32 data format?
I tried to load it via from_pretrained(torch_dtype=torch.float16), then save_pretrained(). However the saved model is still in 4bit.
Can someone kindly land me a hand? Thanks!

@tremblingbrain
Why do you want to convert the model back to fp16? It will be probably slightly even worse quality then this 4 bit one possibly

Use the original model if you want fp16 precision model since it’s going to be higher quality then then the 4 bit one.

Here is a unquantizaton script but I’m not sure if it works with gptq and bitsandbytes or just bitsandbytes.

Script

@YaTharThShaRma999 Thanks a lot for the conversion script.
I actually have some pre-developed code to do computation and analysis, but it only accepts fp16/fp32 models...
So I'm thinking about unquantizing this 4-bit model to fp16, and run some tests, basically comparing to the original fp16 model.

This is a quantized version of Llama-2-13b-chat. You can simply download the original model instead of this quantized version.

Sign up or log in to comment