Kooten/FlatDolphinMaid-8x7B-4bpw-exl2 · using context above 4096

Samvanity

Jan 5, 2024

as title... Is the default context for this model 4096 like llama models?

Kooten

Owner Jan 5, 2024

You don’t need to set compression

Mistral and Mixtral is trained at 8192 and use sliding window up to 32k,
I find 8k to work best though.

Samvanity

Jan 5, 2024

Got it thanks! And use 8bit Cache if I run into OOM? or is using 8bit cache a must? I just tested it without 8-bit cache and it seems to run fine...

Kooten

Owner Jan 6, 2024

8bit cache saves 10-20% vram at no real cost, might be slightly slower, if you have no problem there is no need for it, but it might let you run a model at higher bpw at no real cost.

Samvanity

Jan 22, 2024

You don’t need to set compression

Mistral and Mixtral is trained at 8192 and use sliding window up to 32k

Say if I want to increase the ctx to 10k, can I set compression to 1.25 so it doesn't use sliding window between 8k-10k? or it will still use sliding window as soon as it reaches beyond 8k regardless of settings?

Or that compress_pos_emb is useless unless I want to go beyond 32k?

I guess my question really is about whether the following 2 settings are the same with mixtral / mistral:

ctx = 10k, compress_pos_emb = 1
ctx = 10k, compress_pos_emb = 1.25 (this is assuming base ctx = 8k, so 10k/8k = 1.25)

Thanks!

Kooten

Owner Jan 22, 2024

You should not have to set either compress or rope with Mistral and Mixtral based models at anything below 32k
Mixtral, this model,"gracefully handles a context of 32k tokens."

Kooten
/

FlatDolphinMaid-8x7B-4bpw-exl2

using context above 4096 - need to set compress_pos_emb above 1?