Converting LLaMa 2 bin files to safetensors changes the output

#19
by milad-a - opened

I fine-tuned LLaMa 2 to test its query classification quality and after I saved my final model, I converted the pytorch bin files to safetensors using this file and used it in TGI. But I noticed I am getting completely different results compared to just using AutoModelForCausalLM.from_pretrained() and model.generate(). Note that all other parameters like top_p, top_k, etc. are the same and temperature is set to a small positive value of 0.01. After a lot of testing, I am confident that safetensors are the only variable between the two.

Is this a known fact or a bug in conversion?

Sign up or log in to comment