Spaces:
Running
on
A10G
Add IQ Quantization support with the help of imatrix and GPUs
It will allows us to create imatrix data and quants with one go!
Super useful when we deal with 100b+ models , 1m bit is really nice to support.
Would be awesome to see options for IQ 6 / 5 /4 /3 / 2 NL / XS
Would be really awesome to have an option to upload a txt for imatrix creation and then create imatrix quants with it.
We just merged support for iMatrix! Do let us know if you have any feedback! 🤗
@reach-vb Just gave it a try. I have one suggestion. Currently, it is impossible for anyone to see the progress because the gradio only shows the loading indicator. I think it would be better if console logs are shown instead. This will allow us to track the progress and inspect any errors encountered during calculation/conversion. :)
Thanks to you and everybody else involved. I should close this discussion now. :)
That's a brilliant feedback!
@reach-vb
Hi! When llama.cpp gets updated here?
Sorry for bothering you but currently Gemma(9B) conversation fails because of an assert, which has been fixed upstream.
Need atleast b3389 for the fix.
I was not sure how to contact you, so commented here. 😃