MaziyarPanahi
/

Venomia-1.1-m7-Mistral-7B-Instruct-v0.2-slerp-GGUF

Model card Files Files and versions Community

quant request

by Jeximo - opened Mar 4

Discussion

Jeximo

Mar 4

•

edited Mar 5

Hi, thank you for your effort. Is it possible to add a IQ4_XS gguf version for download?

MaziyarPanahi

Owner Mar 5

Hi @Jeximo
You are welcome. Could you please show me how to do that? I use Llama.cpp and this is the list of available quants:
https://github.com/ggerganov/llama.cpp/tree/master/examples/quantize

Jeximo

Mar 5

Yeah, llama.cpp ReadMe.MD is outdated. See here: https://github.com/ggerganov/llama.cpp/blob/21b08674331e1ea1b599f17c5ca91f0ed173be31/examples/quantize/quantize.cpp#L40

It's the same tool, quantize, to make the new quants, IQ3, IQ4. I'm uncertain of the exact parameters, but quantize --help shows

30  or  IQ4_XS  :  4.25 bpw non-linear quantization

MaziyarPanahi

Owner Mar 5

Yeah, llama.cpp ReadMe.MD is outdated. See here: https://github.com/ggerganov/llama.cpp/blob/21b08674331e1ea1b599f17c5ca91f0ed173be31/examples/quantize/quantize.cpp#L40

It's the same tool, quantize, to make the new quants, IQ3, IQ4. I'm uncertain of the exact parameters, but quantize --help shows
30  or  IQ4_XS  :  4.25 bpw non-linear quantization

fantastic! I will add the missing IQ4_XS here. Do you happen to know more about differences among S, XS and XXS? Just so I can update my script and maybe for 2, 3, and 4 bit I can add one of these new ones as well by default if they are useful

Jeximo

Mar 5

Awesome, I'm glad. Yah, IQ3_S is superior to 3_K_S(https://github.com/ggerganov/llama.cpp/pull/5676), and I think IQ4_XS is an upgrade for 4_K_S. There's a nice graph and discussion here: https://github.com/ggerganov/llama.cpp/pull/5747

My limited understanding is the Size/Bits Per Weight/Perplexity for the new IQ Quants is more effecient. I found this comment is helpful showing the differences for Mistral (click "KL-divergence data for Mistral-7B" to view the table: https://github.com/ggerganov/llama.cpp/pull/5747#issuecomment-1966370132)

MaziyarPanahi

Owner Mar 5

Wow! This is great! Many thanks for all the detailed information. For the time being, I am going to include all the new I-Quant with _XS without dropping any of the current ones. It will be more uploads, but I think it's worth it. I can later on drop the _S ones to save time uploading.

Thanks again @Jeximo for sharing the information

Jeximo changed discussion status to closed Mar 6

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment