Q4_0_4_4

by Yuma42 - opened Aug 26, 2024

Aug 26, 2024

Hey, I'm using the Q4S right now, it's a good model, could you also make Q4_0_4_4 quants? They are supposed to be much faster on mobile. I myself couldn't find much info about these quants yet but there are some models using that.

bartowski

Owner Aug 26, 2024

Yes I'm going to start including them going forward, i may go back and add them to others like this on demand

EloyOn

Aug 31, 2024

Yes I'm going to start including them going forward, i may go back and add them to others like this on demand

Please, could you do q4_0_4_8 for:
https://huggingface.co/tannedbum/Ellaria-9B
https://huggingface.co/tannedbum/L3-Rhaenys-8B
https://huggingface.co/nothingiisreal/MN-12B-Celeste-V1.9

You make them with imatrix, right?

bartowski

Owner Aug 31, 2024

Sure, started!

The Q4_0_X_X don't use imatrix for a lot of their weights so I'm not positive how much it affects it, but I attempt to apply it haha

EloyOn

Aug 31, 2024

Sure, started!

The Q4_0_X_X don't use imatrix for a lot of their weights so I'm not positive how much it affects it, but I attempt to apply it haha

Thank you!

I noticed a substantial gap between q4_0_4_8 and q4km-imat, so I hope that i8mm-imat will somewhat reduce it.

bartowski

Owner Aug 31, 2024

If I'm being honest, I don't fully know the difference in Q4_0_4_4 nd Q4_0_8_8... They get the same PPL apparently but I assume they're different..?

Yuma42

Aug 31, 2024

If I'm being honest, I don't fully know the difference in Q4_0_4_4 nd Q4_0_8_8... They get the same PPL apparently but I assume they're different..?

Hey as far as I understand Q4_0_8_8 needs a very modern chip which my device doesn't have but it can run even faster. I don't remember where I read it but apparently Q4_0_4_4 doesn't need that same hardware feature? Than there is also a third one I think but I don't know about that one.

EloyOn

Aug 31, 2024

•

edited Aug 31, 2024

If I'm being honest, I don't fully know the difference in Q4_0_4_4 nd Q4_0_8_8... They get the same PPL apparently but I assume they're different..?

https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html

@bartowski From what i can tell from my limited reading Q4_0_8_8 requires "sve", Q4_0_4_8 requires "i8mm", if your phone doesn't support either of those try "Q4_0_4_4"

Basically, if you own a smartphone with Snapdragon 8, go with Q4_0_4_8

Yuma42

Aug 31, 2024

Yeah unfortunately I don't have a Snapdragon 8, my phone is older but I have a lot of RAM which is nice to run these models.

bartowski

Owner Aug 31, 2024

•

edited Aug 31, 2024

Ahh very good thank you @EloyOn ! I'm going to add that info to future READMEs :)

Yuma42

Sep 17, 2024

Hello @bartowski maybe you forgot about this one? Anyway since I realized that Gemma 2 9b is very slow in contrast to llama, I wanted to add another model to the request in the hope that it can reach a useful speed with the new quants: bartowski/Gemma-2-Ataraxy-9B-GGUF

EloyOn

Sep 18, 2024

•

edited Sep 18, 2024

Hello @bartowski maybe you forgot about this one? Anyway since I realized that Gemma 2 9b is very slow in contrast to llama, I wanted to add another model to the request in the hope that it can reach a useful speed with the new quants: bartowski/Gemma-2-Ataraxy-9B-GGUF

Hi there, l3utterfly already made the new quants for that model (it doesn't have bartowski's imatrix though): https://huggingface.co/l3utterfly/Gemma-2-Ataraxy-9B-gguf/tree/main

Yuma42

Sep 18, 2024

Hello @bartowski maybe you forgot about this one? Anyway since I realized that Gemma 2 9b is very slow in contrast to llama, I wanted to add another model to the request in the hope that it can reach a useful speed with the new quants: bartowski/Gemma-2-Ataraxy-9B-GGUF

Hi there, l3utterfly already made the new quants for that model (it doesn't have bartowski's imatrix though): https://huggingface.co/l3utterfly/Gemma-2-Ataraxy-9B-gguf/tree/main

Thx I will check it out but yes imatrix is usually better.

bartowski

Owner Sep 18, 2024

Ah yes I did forget, I'll try to get those both today, just tricky while travelling but should have an opportunity in a few hours !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment