Gratitude

#1
by Nexesenex - opened

Thanks for these quants, Knut.
2 of 3 I wanted to do myself (Long Alpaca and Causal)
Btw, what parameters did you use (-ctx and chunks) to make your iMatrix?

The defaults I found, , but one should experiment with that. What impact do they have? This is learning by doing. I just want those quants that enable long context inference. I guess ctx could have an impact here. I added chunks -100.

I would like to learn more about this, if you have more information. I just grabbed what I found from github.

I see there is also the possibility to compile imatrix maker with GPU support, but I have not tried that yet, I kinda doubt that works with the larger models, would properly exhaust my vram, but I don't know.

thing with the context setting is, I kinda expect it does not make a big difference. for normal exllama quantization, as far as I know currently, you don't have to change that to have working inference over long contexts. I guess that's similar here.

important for longalpaca and yarn: we gotta set rope factor to 8 otherwise inference does not seem to work.

Globally, -ctx 512 & --chunks 2000 are the current iMatrix baseline parameters I know.
For the poor, -ctx 32 does give surprisingly good results, even with 25 chunks. But it's better to have at least 1000 chunks, whatever is the -ctx you chose.
Going beyond 512 ctx is not advised as far as my reading go, I don't know the technical specifics though.

Here are some tests I made : https://huggingface.co/Nexesenex/WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant.GGUF
And a conversation on Github with my initial tests as the last comment : https://github.com/ggerganov/llama.cpp/discussions/5006

As for Yarn, it works best with Rope 8 indeed, with an absolutely negligible loss of perplexity compared to rope 2 (unlike in most of models with linear rope, which can be lowered to 2-2.5 with a huge perplexity bonus-drop compared to rope 4 or 8 if you don't need the full context).

Thanks!

KnutJaegersberg changed discussion status to closed

Sign up or log in to comment