Can't reproduce

#3
by ayyylol - opened

How were the gguf versions made? Given that Phi3ForCausalLM is not yet supported by llama.cpp

Architecture 'Phi3ForCausalLM' not supported

You can use convert-hf-to-gguf.py from llama.cpp and then just quantize it the way you want.

I am able to create a custom fine-tune and convert it to gguf file via the convert-hf-to-gguf.py.
But not able to quantize it .... llama.cpp returns llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'phi3'
I am on the latest llama.cpp commit, which should include the phi3 architecture.
Can you please push me in the right direction how to solve it?

How about:

Save the safetensors and configs in models subdirectory

./convert-hf-to-gguf.py models/Phi-3
./quantize models/Phi-3/ggml-model-f16.gguf models/Phi-3/Phi-3-model-Q4_K_M.gguf Q4_K_M

It works but the issue was somewhere else. I was not using the right quantize script.
I rebuilt llama.cpp from source via make and it works!

llama_model_quantize_internal: model size  =  7288.51 MB
llama_model_quantize_internal: quant size  =  2281.66 MB
Microsoft org

Please ensure that you are using a llama.cpp build later than 2717, which has support for Phi-3.

gugarosa changed discussion status to closed

Sign up or log in to comment