Converting hf format model codeLlama34B to GGUF

#3
by goodromka - opened

I fine-tuned CodeLlama-34B-Instruct and now I want to convert it to the codellama-34b-instruct.Q6_K.gguf format. (HF to GGUF)
Could you please tell me how I can do this? What script or method can I use to achieve this?

I used to do it earlier using this: https://github.com/ggerganov/llama.cpp/blob/master/examples/make-ggml.py

@goodromka I believe it's in https://github.com/ggerganov/llama.cpp/tree/master/gguf-py I was poking around in the llama.cpp repo last night that that stuck in my mind. I didn't look through the script in that dir, so that might not be it. But that's somewhere to start.

make-ggml.py should still work fine. Just edit this line first: https://github.com/ggerganov/llama.cpp/blob/master/examples/make-ggml.py#L76 and change it to:

outfile = f"{outdir}/{outname}.{type}.gguf"

You could also change line 66 to:

fp16 = f"{outdir}/{outname}.fp16.gguf"

TBH neither change should actually be needed - you could also just rename the quant files afterwards. I don't think that llama.cpp's quantize cares what the output file is called, and nor does llama.cpp care what the input file is called. But to avoid confusion, edit the script to give them a .gguf extension.

make-ggml.py should still work fine. Just edit...

Thanks for the correction. TIL.

Sign up or log in to comment