compilade (Compilade)

updated a model 13 days ago

compilade/quant-tests

Updated 13 days ago • 82

liked a model 3 months ago

ai21labs/Jamba-tiny-dev

Updated Oct 1 • 11.1k • 9

replied to bartowski's post 3 months ago

KLD measures the difference between 2 probability distributions, typically between a "ground truth" and a model prediction.

Yes, and ln(PPL(Q)/PPL(base)) from my understanding measures the difference between the probabilities for the "correct" tokens according to the test dataset (at least for the second half of each chunk (same as for KLD)). Which means it would be possible to somehow keep perplexity the same or better while also increasing KLD (by making the non-"correct" tokens have different probabilities).

This makes me wonder: do all of the token probabilities have to match closely for a quantized model to still be good?

I guess it depends on whether the goal is to make a faithful quantization, or an equally good model through quantization-aware fine-tuning.
The way imatrix works, it can't really "fine-tune" a model towards a lower perplexity, only prioritize error reduction in the quantization of the weights in the columns with more impact on the activations, so I would say that faithfulness to the full-precision model is the goal of the quantization in this case, and thus KLD feels more appropriate.

Of course, I might be wrong; I don't really have a full understanding of the statistics going on in perplexity and KL-divergence calculations.

However, for quantization-aware fine-tuning, then ln(PPL(Q)/PPL(base)) is likely a better indicator of a better quantization than KLD, unless the goal of the fine-tuning was actually to minimize KLD.

liked 2 models 3 months ago