TheBloke/CodeLlama-34B-Instruct-GGUF · Converting hf format model codeLlama34B to GGUF

Sep 12, 2023

I fine-tuned CodeLlama-34B-Instruct and now I want to convert it to the codellama-34b-instruct.Q6_K.gguf format. (HF to GGUF)
Could you please tell me how I can do this? What script or method can I use to achieve this?

I used to do it earlier using this: https://github.com/ggerganov/llama.cpp/blob/master/examples/make-ggml.py

q5sys

Sep 12, 2023

@goodromka I believe it's in https://github.com/ggerganov/llama.cpp/tree/master/gguf-py I was poking around in the llama.cpp repo last night that that stuck in my mind. I didn't look through the script in that dir, so that might not be it. But that's somewhere to start.

TheBloke

Owner Sep 12, 2023

make-ggml.py should still work fine. Just edit this line first: https://github.com/ggerganov/llama.cpp/blob/master/examples/make-ggml.py#L76 and change it to:

outfile = f"{outdir}/{outname}.{type}.gguf"

You could also change line 66 to:

fp16 = f"{outdir}/{outname}.fp16.gguf"

TBH neither change should actually be needed - you could also just rename the quant files afterwards. I don't think that llama.cpp's quantize cares what the output file is called, and nor does llama.cpp care what the input file is called. But to avoid confusion, edit the script to give them a .gguf extension.

q5sys

Sep 12, 2023

make-ggml.py should still work fine. Just edit...

Thanks for the correction. TIL.