quant request

#1
by Jeximo - opened

Hi, thank you for your effort. Is it possible to add a IQ4_XS gguf version for download?

Hi @Jeximo
You are welcome. Could you please show me how to do that? I use Llama.cpp and this is the list of available quants:
https://github.com/ggerganov/llama.cpp/tree/master/examples/quantize

Yeah, llama.cpp ReadMe.MD is outdated. See here: https://github.com/ggerganov/llama.cpp/blob/21b08674331e1ea1b599f17c5ca91f0ed173be31/examples/quantize/quantize.cpp#L40

It's the same tool, quantize, to make the new quants, IQ3, IQ4. I'm uncertain of the exact parameters, but quantize --help shows

30  or  IQ4_XS  :  4.25 bpw non-linear quantization

Yeah, llama.cpp ReadMe.MD is outdated. See here: https://github.com/ggerganov/llama.cpp/blob/21b08674331e1ea1b599f17c5ca91f0ed173be31/examples/quantize/quantize.cpp#L40

It's the same tool, quantize, to make the new quants, IQ3, IQ4. I'm uncertain of the exact parameters, but quantize --help shows

30  or  IQ4_XS  :  4.25 bpw non-linear quantization

fantastic! I will add the missing IQ4_XS here. Do you happen to know more about differences among S, XS and XXS? Just so I can update my script and maybe for 2, 3, and 4 bit I can add one of these new ones as well by default if they are useful

Awesome, I'm glad. Yah, IQ3_S is superior to 3_K_S(https://github.com/ggerganov/llama.cpp/pull/5676), and I think IQ4_XS is an upgrade for 4_K_S. There's a nice graph and discussion here: https://github.com/ggerganov/llama.cpp/pull/5747

My limited understanding is the Size/Bits Per Weight/Perplexity for the new IQ Quants is more effecient. I found this comment is helpful showing the differences for Mistral (click "KL-divergence data for Mistral-7B" to view the table: https://github.com/ggerganov/llama.cpp/pull/5747#issuecomment-1966370132)

Wow! This is great! Many thanks for all the detailed information. For the time being, I am going to include all the new I-Quant with _XS without dropping any of the current ones. It will be more uploads, but I think it's worth it. I can later on drop the _S ones to save time uploading.

Thanks again @Jeximo for sharing the information

Jeximo changed discussion status to closed

Sign up or log in to comment