Possible GGUF
Would you please convert this into a GGUF format? Thank you.
And could you please make GGUF quants of your existing Platypus and Deacon models?
To my knowledge ggml already supports the original Qwen architecture. Hence there is already ggufs:
https://huggingface.co/zly/Qwen-1_8B-Chat-Int4-GGUF
https://huggingface.co/Qwen/Qwen-1_8B-Chat
It's possible to gguf the original model, easily, too.
This llamafication is mostly for easier fine tuning and for broader llama ecosystem compatibilty.
Nonetheless, I wanted to guff these, too. I think 6bit is a pretty much lossless quantization and there is indication you can do 4 bit with this particular model with very low quality loss.
I'm uploading the 6 bit quants now, give it a minute:
https://huggingface.co/KnutJaegersberg/Qwen-1_8B-gguf
Thank you!