Quantization update?
Any chance you or someone could requantize to the latest exl2? No idea the total vram requirements when quantizing and have FOMO on the bpw quality improvements since.
Hi there, sorry, life have been busy. I will try to do it as soon as I can. It takes about 10 hours to do it fully and then ~5 hours per each bpw size on a RTX 4090.
No worries, take your time!
P.S. I don't know if it's just me but the newer quantized models take slightly more space on the same system? A 2.4bpw fp8 kv-cache used to go to 16k on a 1x3090 system. The system is unchanged, but now only goes to 8k with fp8 cache.
I have updated the quants, so I suggest to backup in any case.
About the newer quantized sizes, I haven't tested enough yet, but it seems they're fairly similar.