The fp16 model has the same size as the original model
#1
by
adnan-ahmad-tub
- opened
CodeLlama-34B-Instruct-fp16 has 7*9GB checkpoints, same as
CodeLlama-34B-Instruct. Shouldn't the fp16 quantization reduce the model size?
fp16 isn't a quantisation, it's the original model. You can use CodeLlama/CodeLlama-34B-Instruct-HF now. I put this repo up because originally there was no official source for these models on HF, but now there is.
Is it the same as loading the model using model = AutoModelForCausalLM.from_pretrained(model_directory, device_map = 'auto', torch_dtype=torch.float16)
Hi
Does anyone have some GPU requirement / suggestions using this model?
I'm planing buying some A6000 or something to run this locally.
Thanks!