add GPTQ, AWQ, and GGUFF formats

#11
by nonetrix - opened

Would be nice if these formats could be added so it's easier to run

Thanks, GPTQ model would be nice too for the time being. Unfortunately as of now getting AWQ to work on AMD GPUs is a bit of a hassle. Personally I use AMD unfortunately since NVIDIA in terms of VRAM is prohibitively expensive for me, likely won't be needed soon though I think progress is coming along slowly for AMD with AWQ but it's not in the major GUIs I think yet

AWQ : https://huggingface.co/OrionStarAI/Orion-14B-Chat-Int4
GGUFF: https://huggingface.co/OrionStarAI/Orion-14B-Chat/blob/main/Orion-14B-Chat.gguf

Do you have conversion python script for .\convert.py on llama cpp repo?
I am getting
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 444, got 363
when I try to load this.

@DachengZhang

OrionStarAI org

AWQ : https://huggingface.co/OrionStarAI/Orion-14B-Chat-Int4
GGUFF: https://huggingface.co/OrionStarAI/Orion-14B-Chat/blob/main/Orion-14B-Chat.gguf

Do you have conversion python script for .\convert.py on llama cpp repo?
I am getting
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 444, got 363
when I try to load this.

@DachengZhang

plz check
https://github.com/ggerganov/llama.cpp/blob/master/convert-hf-to-gguf.py

Sign up or log in to comment