Will there be a IQ1_S version?
I noticed Qwen2-57B-A14B-Instruct.IQ1_S and a bunch of others have been stuck processing for a while. Will they finish processing at some point or is there something preventing that quant with this model?
Hey @phly95
Unfortunately, we don't have a large enough dataset to run against the model such that we can activate all the experts.
Currently, 3 experts never get activated, and the imatrix calculations always fail.
Have a look at https://huggingface.co/legraphista/Qwen2-57B-A14B-Instruct-GGUF, it goes as low as Q2_K. I hope that helps
I see. I was curious about running it on a 16GB M1 Mac, but I guess that isn't an option for now. Thank you.
That being said, the q2_k might make it possible to run on a computer I have with 8GB of VRAM and 16GB of system RAM.
Technically, only 14B parameters are active at a time, so even if won't all fit in VRAM/RAM, llama cpp will stream the params from storage. With a MoE, you may be lucky with your prompt and not need to swap experts that many times.
Give it a try.