Q4_0_4_4

#2
by Yuma42 - opened

Hey, I'm using the Q4S right now, it's a good model, could you also make Q4_0_4_4 quants? They are supposed to be much faster on mobile. I myself couldn't find much info about these quants yet but there are some models using that.

Yes I'm going to start including them going forward, i may go back and add them to others like this on demand

Yes I'm going to start including them going forward, i may go back and add them to others like this on demand

Please, could you do q4_0_4_8 for:
https://huggingface.co/tannedbum/Ellaria-9B
https://huggingface.co/tannedbum/L3-Rhaenys-8B
https://huggingface.co/nothingiisreal/MN-12B-Celeste-V1.9

You make them with imatrix, right?

Sure, started!

The Q4_0_X_X don't use imatrix for a lot of their weights so I'm not positive how much it affects it, but I attempt to apply it haha

Sure, started!

The Q4_0_X_X don't use imatrix for a lot of their weights so I'm not positive how much it affects it, but I attempt to apply it haha

Thank you!

I noticed a substantial gap between q4_0_4_8 and q4km-imat, so I hope that i8mm-imat will somewhat reduce it.

If I'm being honest, I don't fully know the difference in Q4_0_4_4 nd Q4_0_8_8... They get the same PPL apparently but I assume they're different..?

If I'm being honest, I don't fully know the difference in Q4_0_4_4 nd Q4_0_8_8... They get the same PPL apparently but I assume they're different..?

Hey as far as I understand Q4_0_8_8 needs a very modern chip which my device doesn't have but it can run even faster. I don't remember where I read it but apparently Q4_0_4_4 doesn't need that same hardware feature? Than there is also a third one I think but I don't know about that one.

If I'm being honest, I don't fully know the difference in Q4_0_4_4 nd Q4_0_8_8... They get the same PPL apparently but I assume they're different..?

https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html

@bartowski From what i can tell from my limited reading Q4_0_8_8 requires "sve", Q4_0_4_8 requires "i8mm", if your phone doesn't support either of those try "Q4_0_4_4"

Basically, if you own a smartphone with Snapdragon 8, go with Q4_0_4_8

Yeah unfortunately I don't have a Snapdragon 8, my phone is older but I have a lot of RAM which is nice to run these models.

Ahh very good thank you @EloyOn ! I'm going to add that info to future READMEs :)

Hello @bartowski maybe you forgot about this one? Anyway since I realized that Gemma 2 9b is very slow in contrast to llama, I wanted to add another model to the request in the hope that it can reach a useful speed with the new quants: bartowski/Gemma-2-Ataraxy-9B-GGUF

Hello @bartowski maybe you forgot about this one? Anyway since I realized that Gemma 2 9b is very slow in contrast to llama, I wanted to add another model to the request in the hope that it can reach a useful speed with the new quants: bartowski/Gemma-2-Ataraxy-9B-GGUF

Hi there, l3utterfly already made the new quants for that model (it doesn't have bartowski's imatrix though): https://huggingface.co/l3utterfly/Gemma-2-Ataraxy-9B-gguf/tree/main

Hello @bartowski maybe you forgot about this one? Anyway since I realized that Gemma 2 9b is very slow in contrast to llama, I wanted to add another model to the request in the hope that it can reach a useful speed with the new quants: bartowski/Gemma-2-Ataraxy-9B-GGUF

Hi there, l3utterfly already made the new quants for that model (it doesn't have bartowski's imatrix though): https://huggingface.co/l3utterfly/Gemma-2-Ataraxy-9B-gguf/tree/main

Thx I will check it out but yes imatrix is usually better.

Ah yes I did forget, I'll try to get those both today, just tricky while travelling but should have an opportunity in a few hours !

Sign up or log in to comment