About quants

by saishf - opened Mar 9

Mar 9

Have you tried quantizing down to the older Q4_0, Q4_1, q5_0s? i recall mixtral 8x7b models having problems with the newer quant methods but working fine with the older types.
It's mentioned in the beginning of this article
https://rentry.org/HowtoMixtral
And Undi mentions it in this model card
https://huggingface.co/Undi95/Toppy-Mix-4x7B-GGUF

Kquant03

Owner Mar 9

Have you tried quantizing down to the older Q4_0, Q4_1, q5_0s? i recall mixtral 8x7b models having problems with the newer quant methods but working fine with the older types.
It's mentioned in the beginning of this article
https://rentry.org/HowtoMixtral
And Undi mentions it in this model card
https://huggingface.co/Undi95/Toppy-Mix-4x7B-GGUF

no I didn't try this, does q4_0 still exist on the new versions of llama.cpp?

saishf

Mar 9

I believe it still supports it, i can still make Q8_0 quants successfully. Although i haven't updated it since late january.

saishf

Mar 9

It lists all the Q4_0 - Q8_1, Q2_k - Q8_k and IQ quants in here, so i'd guess they're still in the most recent releases?
https://github.com/ggerganov/llama.cpp/blob/master/ggml-quants.h

Kquant03

Owner Mar 10

It lists all the Q4_0 - Q8_1, Q2_k - Q8_k and IQ quants in here, so i'd guess they're still in the most recent releases?
https://github.com/ggerganov/llama.cpp/blob/master/ggml-quants.h

alright I just tried it and it didn't work, this was a great idea though :)

saishf

Mar 11

I tried to quant it too, it's cursed. Crashes kobold instantly. Doesn't even give an error

Kquant03

Owner Mar 11

I tried to quant it too, it's cursed. Crashes kobold instantly. Doesn't even give an error

The Cursed Pygmalion-Xwin Mixture

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment