Is a higher quant possible?

#1
by jackboot - opened

Something in between 4-5 bits? Or is the FP16 completely janky due to being dequantized?

I should think if we had the measurement.json we could just quant it to a different size. Exl2 needs a monkeypatch to skip a sanity check for the quant to run though.

Sign up or log in to comment