The fp16 model has the same size as the original model

#1
by adnan-ahmad-tub - opened

CodeLlama-34B-Instruct-fp16 has 7*9GB checkpoints, same as
CodeLlama-34B-Instruct. Shouldn't the fp16 quantization reduce the model size?

fp16 isn't a quantisation, it's the original model. You can use CodeLlama/CodeLlama-34B-Instruct-HF now. I put this repo up because originally there was no official source for these models on HF, but now there is.

Is it the same as loading the model using model = AutoModelForCausalLM.from_pretrained(model_directory, device_map = 'auto', torch_dtype=torch.float16)

Hi
Does anyone have some GPU requirement / suggestions using this model?
I'm planing buying some A6000 or something to run this locally.
Thanks!

Sign up or log in to comment