Gratitude

by Nexesenex - opened Jan 27

Jan 27

Thanks for these quants, Knut.
2 of 3 I wanted to do myself (Long Alpaca and Causal)
Btw, what parameters did you use (-ctx and chunks) to make your iMatrix?

KnutJaegersberg

Owner Jan 28

The defaults I found, , but one should experiment with that. What impact do they have? This is learning by doing. I just want those quants that enable long context inference. I guess ctx could have an impact here. I added chunks -100.

KnutJaegersberg

Owner Jan 28

I would like to learn more about this, if you have more information. I just grabbed what I found from github.

KnutJaegersberg

Owner Jan 28

I see there is also the possibility to compile imatrix maker with GPU support, but I have not tried that yet, I kinda doubt that works with the larger models, would properly exhaust my vram, but I don't know.

KnutJaegersberg

Owner Jan 28

thing with the context setting is, I kinda expect it does not make a big difference. for normal exllama quantization, as far as I know currently, you don't have to change that to have working inference over long contexts. I guess that's similar here.

KnutJaegersberg

Owner Jan 28

important for longalpaca and yarn: we gotta set rope factor to 8 otherwise inference does not seem to work.

Nexesenex

Jan 28

•

edited Jan 28

Globally, -ctx 512 & --chunks 2000 are the current iMatrix baseline parameters I know.
For the poor, -ctx 32 does give surprisingly good results, even with 25 chunks. But it's better to have at least 1000 chunks, whatever is the -ctx you chose.
Going beyond 512 ctx is not advised as far as my reading go, I don't know the technical specifics though.

Here are some tests I made : https://huggingface.co/Nexesenex/WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant.GGUF
And a conversation on Github with my initial tests as the last comment : https://github.com/ggerganov/llama.cpp/discussions/5006

As for Yarn, it works best with Rope 8 indeed, with an absolutely negligible loss of perplexity compared to rope 2 (unlike in most of models with linear rope, which can be lowered to 2-2.5 with a huge perplexity bonus-drop compared to rope 4 or 8 if you don't need the full context).

KnutJaegersberg

Owner Jan 31

Thanks!

KnutJaegersberg changed discussion status to closed Jan 31

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment