How did you make these quants?

#1
by rombodawg - opened

Did you use llamacpp's convert.py to generate these gguf model files? Me and the community have been struggling to figure out why some of these gemma models, fine-tuned, or merged, simply are not working during inference of loading after converting to gguf. Can you share the code you used to convert to gguf from hf tensors? If it wasnt from llamacpp?

Or if it was with llamacpp, was it a new branch? or the main branch? A custom method? Please share

@rombodawg What do you mean? Are my gguf files working? I'm not doing anything special but if you are having issues with this model in particular you have to make sure the repeat penalty is disabled (i.e. set it to 1.0) otherwise the model will produce incoherent output.

@dranger003 Im not having issues with your gguf model files. Im having issues with every other gguf model files made from Gemma that exists. Yours seem to be the only ones that are working. The main thing is im trying to make new Gemma models with Mergekit, and those resulting models arent working after quantization. You can see the issues i have opened bellow, with multiple threads linked at the bottom of that issue

https://github.com/ggerganov/llama.cpp/issues/5706#issuecomment-1963015755

@rombodawg You are not using llama.cpp directly, that's why. Just use the file from the repo.
https://github.com/ggerganov/llama.cpp/blob/cbbd1efa06f8c09f9dff58ff9d9af509cc4c152b/convert-hf-to-gguf.py#L221

@dranger003 I just downloaded that file. I got this error when using it

NotImplementedError: Architecture "GemmaForCausalLM" not supported!

I dont know whats wrong with that script, i even tried converting a llama2 model and i got

NotImplementedError: Architecture "LlamaForCausalLM" not supported!

Only convert.py works for me. Like what is stated in the official documentation to use when converting to gguf

Sign up or log in to comment