quant request
Hi, thank you for your effort. Is it possible to add a IQ4_XS gguf version for download?
Hi
@Jeximo
You are welcome. Could you please show me how to do that? I use Llama.cpp and this is the list of available quants:
https://github.com/ggerganov/llama.cpp/tree/master/examples/quantize
Yeah, llama.cpp ReadMe.MD is outdated. See here: https://github.com/ggerganov/llama.cpp/blob/21b08674331e1ea1b599f17c5ca91f0ed173be31/examples/quantize/quantize.cpp#L40
It's the same tool, quantize
, to make the new quants, IQ3, IQ4. I'm uncertain of the exact parameters, but quantize --help
shows
30 or IQ4_XS : 4.25 bpw non-linear quantization
Yeah, llama.cpp ReadMe.MD is outdated. See here: https://github.com/ggerganov/llama.cpp/blob/21b08674331e1ea1b599f17c5ca91f0ed173be31/examples/quantize/quantize.cpp#L40
It's the same tool,
quantize
, to make the new quants, IQ3, IQ4. I'm uncertain of the exact parameters, butquantize --help
shows30 or IQ4_XS : 4.25 bpw non-linear quantization
fantastic! I will add the missing IQ4_XS
here. Do you happen to know more about differences among S
, XS
and XXS
? Just so I can update my script and maybe for 2
, 3
, and 4
bit I can add one of these new ones as well by default if they are useful
Awesome, I'm glad. Yah, IQ3_S is superior to 3_K_S(https://github.com/ggerganov/llama.cpp/pull/5676), and I think IQ4_XS is an upgrade for 4_K_S. There's a nice graph and discussion here: https://github.com/ggerganov/llama.cpp/pull/5747
My limited understanding is the Size/Bits Per Weight/Perplexity for the new IQ Quants is more effecient. I found this comment is helpful showing the differences for Mistral (click "KL-divergence data for Mistral-7B" to view the table: https://github.com/ggerganov/llama.cpp/pull/5747#issuecomment-1966370132)
Wow! This is great! Many thanks for all the detailed information. For the time being, I am going to include all the new I-Quant with _XS without dropping any of the current ones. It will be more uploads, but I think it's worth it. I can later on drop the _S ones to save time uploading.
Thanks again @Jeximo for sharing the information