iMatrix files

by Nexesenex - opened May 11, 2024

Discussion

Nexesenex

May 11, 2024

Hey!
Could you share your iMatrix files in your repos?

MarsupialAI

Owner May 11, 2024

I always post the text file they're generated from. Are the .dat files themselves actually useful for anything after the model has been quantized?

I've uploaded the .dat for this model. I suppose I could upload the other ones that I still have if there's actually a need for them.

Nexesenex

May 11, 2024

•

edited May 11, 2024

Well, the .dat is paramount, aka. the necessary file for Imatrix led quantization, otherwise the iMatrix.dat needs to be remade based on the .txt file.
I'm currently short of VRAM+RAM, and I can't even redo the iMatrix from a 8 bits GGUF for a 70b model. That's why I am asking! ^^
Sharing the file is the practice which makes consensus, so everyone can test quant strategies or requant to their needs when new official quant strategies appear on LlamaCPP.
Oh, and I forgot.. Thank you !

MarsupialAI

Owner May 11, 2024

OK, I've gone back through the old quants and uploaded the imatrix .dats that I still had laying around.

Nexesenex

May 11, 2024

Thanks, you made my day!
Your iMatrix allowed me to requant my L3 70b Abl FP16 optimally, with a drop of 1pt in perplexity, and a bump of +2 in ARC benches.
And I can use IQ Quants to quantize as well now.

MarsupialAI

Owner May 11, 2024

I also uploaded the measurement.json files for my exl2 quants where I still had them. Not sure if that's something you're interested in as well.

MarsupialAI changed discussion status to closed May 11, 2024

Nexesenex

May 11, 2024

Not personally at the moment, but the exl2 enthusiasts will be! Thanks!

MarsupialAI

Owner May 11, 2024

FYI, this conversation got me thinking about whether the weight of the GGUF used to generate imatrices actually mattered. So I did a science about it. https://huggingface.co/MarsupialAI/Llama3_GGUF_Quant_Testing

Nexesenex

May 11, 2024

•

edited May 11, 2024

Your experiment is straight on point, and when one think about it, it's actually quite sensical and even obvious.
I went on Q8_0 to make my iMats and quantize since iMatrix appeared.
There's a tiny loss with Q8_0 as a quant base, but none with mere iMatrixing.
I didn't think about using a Q4_0 quant (or even a Q6_K) to make the iMatrix, so a 70b fits in my 36GB VRAM.
Obviously, the obvious ain't that obvious..
Anyway, bravo, and you should share this experiment of yours on the LCPP github, and why not expand it to Q4_K_S and IQ4_XS to see it it works similarly (and it should, because the iMatrixing seems to be something "vectorial".

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment