Can you make a Q4_K_M?

#1
by TheYuriLover - opened

It's the only one that I can handle on my 12gb VRAM card

Q4_K_S can be handled on 12GB VRAM in 4K context.
Proof, using koboldcpp.exe --usecublas normal mmq --gpulayers 43 --contextsize 4096 --threads 4

image.png

That's why I choose Q4_K_S over Q4_K_M. But if needed, I will do it, just reply to me.

Edit: I will do it anyway, after thinking haha, sorry I'm a little lazy today.
Working on it.

Well TheBloke is doing it right now!
https://huggingface.co/TheBloke/Xwin-LM-13B-v0.2-GGUF/tree/main
Enjoy!

Yeah I know that Q4_K_S is working but Q4_K_M is also working and it's a better quant that's why I wanted that one
Anyway, thanks for the link I'm gonna check it out, good luck with your constant grind of meming the leaderboard with merges, I'm with you on that one, we need better metrics and to get that we want we need to show how we can easily abuse the old ones :^)

Sign up or log in to comment