Quantization update?

#5
by imi2 - opened

Any chance you or someone could requantize to the latest exl2? No idea the total vram requirements when quantizing and have FOMO on the bpw quality improvements since.

Owner
β€’
edited Jan 4

Hi there, sorry, life have been busy. I will try to do it as soon as I can. It takes about 10 hours to do it fully and then ~5 hours per each bpw size on a RTX 4090.

imi2 changed discussion status to closed

No worries, take your time!

P.S. I don't know if it's just me but the newer quantized models take slightly more space on the same system? A 2.4bpw fp8 kv-cache used to go to 16k on a 1x3090 system. The system is unchanged, but now only goes to 8k with fp8 cache.

I have updated the quants, so I suggest to backup in any case.

About the newer quantized sizes, I haven't tested enough yet, but it seems they're fairly similar.

Sign up or log in to comment