About quants

#1
by saishf - opened

Have you tried quantizing down to the older Q4_0, Q4_1, q5_0s? i recall mixtral 8x7b models having problems with the newer quant methods but working fine with the older types.
It's mentioned in the beginning of this article
https://rentry.org/HowtoMixtral
And Undi mentions it in this model card
https://huggingface.co/Undi95/Toppy-Mix-4x7B-GGUF

Owner

Have you tried quantizing down to the older Q4_0, Q4_1, q5_0s? i recall mixtral 8x7b models having problems with the newer quant methods but working fine with the older types.
It's mentioned in the beginning of this article
https://rentry.org/HowtoMixtral
And Undi mentions it in this model card
https://huggingface.co/Undi95/Toppy-Mix-4x7B-GGUF

no I didn't try this, does q4_0 still exist on the new versions of llama.cpp?

I believe it still supports it, i can still make Q8_0 quants successfully. Although i haven't updated it since late january.

It lists all the Q4_0 - Q8_1, Q2_k - Q8_k and IQ quants in here, so i'd guess they're still in the most recent releases?
https://github.com/ggerganov/llama.cpp/blob/master/ggml-quants.h

It lists all the Q4_0 - Q8_1, Q2_k - Q8_k and IQ quants in here, so i'd guess they're still in the most recent releases?
https://github.com/ggerganov/llama.cpp/blob/master/ggml-quants.h

alright I just tried it and it didn't work, this was a great idea though :)

I tried to quant it too, it's cursed. Crashes kobold instantly. Doesn't even give an error

I tried to quant it too, it's cursed. Crashes kobold instantly. Doesn't even give an error

The Cursed Pygmalion-Xwin Mixture

Sign up or log in to comment