llama.cpp quantize

#3
by LiuCi - opened

Hi, author, I'm trying to quantify the model with llama.cpp. But when I use python convert-hf-to-gguf.py to generate GGUF file, I got the error "NotImplementedError: Architecture 'LlamaForCausalLM' not supported!".
then I have tried with convert.py, but this would get a error at the next step as shown on github. I have read your comment on reddit, you are a great coder.
Could you please tell me how to quantize the model with llama.cpp or any tutorial? I am a student who trying to place the model on a small platfrom. Thanks a lot for your reading!

Sign up or log in to comment