legraphista/Qwen2-57B-A14B-Instruct-IMat-GGUF · Will there be a IQ1

Will there be a IQ1_S version?

by phly95 - opened Jun 21, 2024

Jun 21, 2024

I noticed Qwen2-57B-A14B-Instruct.IQ1_S and a bunch of others have been stuck processing for a while. Will they finish processing at some point or is there something preventing that quant with this model?

legraphista

Owner Jun 21, 2024

Hey @phly95

Unfortunately, we don't have a large enough dataset to run against the model such that we can activate all the experts.
Currently, 3 experts never get activated, and the imatrix calculations always fail.

Have a look at https://huggingface.co/legraphista/Qwen2-57B-A14B-Instruct-GGUF, it goes as low as Q2_K. I hope that helps

phly95

Jun 21, 2024

I see. I was curious about running it on a 16GB M1 Mac, but I guess that isn't an option for now. Thank you.

phly95

Jun 21, 2024

That being said, the q2_k might make it possible to run on a computer I have with 8GB of VRAM and 16GB of system RAM.

legraphista

Owner Jun 21, 2024

Technically, only 14B parameters are active at a time, so even if won't all fit in VRAM/RAM, llama cpp will stream the params from storage. With a MoE, you may be lucky with your prompt and not need to swap experts that many times.

Give it a try.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment