Only q8_0 is working. I think maybe the moe merges have this flaw.
Only q8_0 is working. I think maybe the moe merges have this flaw. Let me know if anyone have any insights regarding this
I'm personally using the Q5_K_M without any issue inferencing through llama.cpp (TGWUI and Jan).
Which ones did you try and what inference engine did you use?
I just found Q6_K is broken! It only outputs boxes!
eg. "33A
//EMI
2I,IMA8E#?E?E,(Q IQ.22E3A"
But in the current state, at least those 3 are tested and working fine (and this was unlucky as those were the only ones I tested and kept for myself so I never noticed the issue):
- Q4_K_M
- Q5_K_S
- Q5_K_M
Maybe I messed with 2 scripts or went out of storage when converting this specific one. If required, would be happy to try to requantize as I kept the F16 model on my drive
But that would be really helpful that you let me know the broken ones you tried.
OK, I retested every quants, so really no luck if you only tried the Q6_K before the Q8_0 as this was the only wrong one!
I'm gonna requantize it and reupload if everything goes right this time. I'll let you know ;)
Q6_K requantized and working as intended this time!
So far, the only one I didn't test is the Q8_0. But with your feedback on this one, we've now covered it all :)
Thanks for pointing out the issue!
I'm closing this.