Ed Addario's picture

Ed Addario PRO

eaddario

AI & ML interests

None yet

Recent Activity

View all activity

Organizations

None yet

Posts 8

view post
Post
276
HF community survey: What is an acceptable Perplexity (PPL) degradation?

An area of personal research is to find ways to shrink the size of LLMs without incurring in a noticeable loss of capability. All the models in my repo have been generated by quantizing different tensors at different levels based on how much they influence the inference process (see the model's card for more details). This approach produces, on average, a ~10% size reduction with < 1% of PPL penalty.

I'm now focusing on pruning (whole layer removal), as a way to achieve better size reduction, but this comes at the cost of a much higher PPL degradation.

So, the question for the HF community is: what is the lowest/worst PPL correlation coefficient (𝜌PPL) you'd consider acceptable for a quantized model? (e.g. 99%? 95%? 90%? etc)

To clarify, by 𝜌PPL I mean the Cor(ln(PPL(Q)), ln(PPL(base))) statistic generated by llama-perplexity.