CTX size.
Hi,
does this model support 32k context withought NTK RoPE scaling?
From what I can see merged models have 8k context and some 16k, so my guess is this model rather be limited to 8k context.
This is a mixtral exllama model. Refer to the config.json of the non-quantized model for context size. Mixtral supports 32k context.
Yes, the config.json specifies a context length of 32k. However, all MoE merges I tried failed above 8k context. It appears that without further adjustments, they can not reaching full context.
Yes, the config.json specifies a context length of 32k. However, all MoE merges I tried failed above 8k context. It appears that without further adjustments, they can not reaching full context.
It depends on the available memory and how you are running the inference. To get 8 or 16K context, will take an enormous amount of memory to handle the context. This relates to the inefficiencies of how each batch or chunk needs to setup the whole inference pipeline. So, as you increase the context window, the complexity and memory requirements scale at a much greater rate.