This discussion has been hidden

#516
by deleted - opened
deleted
This comment has been hidden

Can't say I am thrilled to download and convert these big models just for one quant, but you definitely earned it :) They should be there already.

mradermacher changed discussion status to closed
deleted
This comment has been hidden
deleted
This comment has been hidden

usually Q4_1 has lower perplexity than Q4_0 on every value.

Well, we've been telling you: Q4_1 is well known to be very unstable. That was one of the reasons it was abondoned: it is often larger and worse than Q4_0. The reason I added it was because you convinced me of the usefulness in certain situations (speed with metal). This is just a data point that the problems were not fixed in recent versions.

The Q4_1 quant was done with the same imatrix, but using the current version of llama.cpp.

deleted
This comment has been hidden
deleted
This comment has been hidden

We will have to accept that Q4_1 quants have the potential to turn out worse than Q4_0 quants. For static quants my measurements so far even indicate that this usually is the case. Now while for wighted/imatrix quants Q4_1 is usually much better than Q4_0 you cannot relay on imatrix training working that well for every model. There are just some models/architectures that see less improvement from wighted/imatrix quants in which case Q4_0 might still beat Q4_1 however I would expect such outliers to be quite rare and I would expect such cases to mainly happen for non-English models due to ouer imatrix dataset beeing English focused.

In this specific case I wouldn't count too much on perplexity measurements. They are one of the worst measurements llama.cpp gives you especially when comparing quants of almost the same quality. Instead use KL-divergence and token probability measurments and see if they lead you to the same conclusion.

deleted
This comment has been hidden
deleted
This comment has been hidden
deleted changed discussion title from Q4_1 requests to This discussion has been hidden

Sign up or log in to comment