Q4_0_4_4
Hey, I'm using the Q4S right now, it's a good model, could you also make Q4_0_4_4 quants? They are supposed to be much faster on mobile. I myself couldn't find much info about these quants yet but there are some models using that.
Yes I'm going to start including them going forward, i may go back and add them to others like this on demand
Yes I'm going to start including them going forward, i may go back and add them to others like this on demand
Please, could you do q4_0_4_8 for:
https://huggingface.co/tannedbum/Ellaria-9B
https://huggingface.co/tannedbum/L3-Rhaenys-8B
https://huggingface.co/nothingiisreal/MN-12B-Celeste-V1.9
You make them with imatrix, right?
Sure, started!
The Q4_0_X_X don't use imatrix for a lot of their weights so I'm not positive how much it affects it, but I attempt to apply it haha
Sure, started!
The Q4_0_X_X don't use imatrix for a lot of their weights so I'm not positive how much it affects it, but I attempt to apply it haha
Thank you!
I noticed a substantial gap between q4_0_4_8 and q4km-imat, so I hope that i8mm-imat will somewhat reduce it.
If I'm being honest, I don't fully know the difference in Q4_0_4_4 nd Q4_0_8_8... They get the same PPL apparently but I assume they're different..?
If I'm being honest, I don't fully know the difference in Q4_0_4_4 nd Q4_0_8_8... They get the same PPL apparently but I assume they're different..?
Hey as far as I understand Q4_0_8_8 needs a very modern chip which my device doesn't have but it can run even faster. I don't remember where I read it but apparently Q4_0_4_4 doesn't need that same hardware feature? Than there is also a third one I think but I don't know about that one.
If I'm being honest, I don't fully know the difference in Q4_0_4_4 nd Q4_0_8_8... They get the same PPL apparently but I assume they're different..?
https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html
@bartowski From what i can tell from my limited reading Q4_0_8_8 requires "sve", Q4_0_4_8 requires "i8mm", if your phone doesn't support either of those try "Q4_0_4_4"
Basically, if you own a smartphone with Snapdragon 8, go with Q4_0_4_8
Yeah unfortunately I don't have a Snapdragon 8, my phone is older but I have a lot of RAM which is nice to run these models.
Hello @bartowski maybe you forgot about this one? Anyway since I realized that Gemma 2 9b is very slow in contrast to llama, I wanted to add another model to the request in the hope that it can reach a useful speed with the new quants: bartowski/Gemma-2-Ataraxy-9B-GGUF
Hello @bartowski maybe you forgot about this one? Anyway since I realized that Gemma 2 9b is very slow in contrast to llama, I wanted to add another model to the request in the hope that it can reach a useful speed with the new quants: bartowski/Gemma-2-Ataraxy-9B-GGUF
Hi there, l3utterfly already made the new quants for that model (it doesn't have bartowski's imatrix though): https://huggingface.co/l3utterfly/Gemma-2-Ataraxy-9B-gguf/tree/main
Hello @bartowski maybe you forgot about this one? Anyway since I realized that Gemma 2 9b is very slow in contrast to llama, I wanted to add another model to the request in the hope that it can reach a useful speed with the new quants: bartowski/Gemma-2-Ataraxy-9B-GGUF
Hi there, l3utterfly already made the new quants for that model (it doesn't have bartowski's imatrix though): https://huggingface.co/l3utterfly/Gemma-2-Ataraxy-9B-gguf/tree/main
Thx I will check it out but yes imatrix is usually better.
Ah yes I did forget, I'll try to get those both today, just tricky while travelling but should have an opportunity in a few hours !