add GPTQ, AWQ, and GGUFF formats

#11

by nonetrix - opened Jan 27

Discussion

nonetrix

Jan 27

Would be nice if these formats could be added so it's easier to run

DachengZhang

OrionStarAI org Jan 29

AWQ : https://huggingface.co/OrionStarAI/Orion-14B-Chat-Int4
GGUFF: https://huggingface.co/OrionStarAI/Orion-14B-Chat/blob/main/Orion-14B-Chat.gguf

nonetrix

Jan 29

•

edited Jan 29

Thanks, GPTQ model would be nice too for the time being. Unfortunately as of now getting AWQ to work on AMD GPUs is a bit of a hassle. Personally I use AMD unfortunately since NVIDIA in terms of VRAM is prohibitively expensive for me, likely won't be needed soon though I think progress is coming along slowly for AMD with AWQ but it's not in the major GUIs I think yet

shing3232

Feb 16

•

edited Feb 19

AWQ : https://huggingface.co/OrionStarAI/Orion-14B-Chat-Int4
GGUFF: https://huggingface.co/OrionStarAI/Orion-14B-Chat/blob/main/Orion-14B-Chat.gguf

Do you have conversion python script for .\convert.py on llama cpp repo?
I am getting
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 444, got 363
when I try to load this.

@DachengZhang

sharp

OrionStarAI org Feb 20

AWQ : https://huggingface.co/OrionStarAI/Orion-14B-Chat-Int4
GGUFF: https://huggingface.co/OrionStarAI/Orion-14B-Chat/blob/main/Orion-14B-Chat.gguf

Do you have conversion python script for .\convert.py on llama cpp repo?
I am getting
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 444, got 363
when I try to load this.

@DachengZhang

plz check
https://github.com/ggerganov/llama.cpp/blob/master/convert-hf-to-gguf.py

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment