CTX size.

#1
by altomek - opened

Hi,

does this model support 32k context withought NTK RoPE scaling?
From what I can see merged models have 8k context and some 16k, so my guess is this model rather be limited to 8k context.

altomek changed discussion status to closed
altomek changed discussion status to open

This is a mixtral exllama model. Refer to the config.json of the non-quantized model for context size. Mixtral supports 32k context.

Yes, the config.json specifies a context length of 32k. However, all MoE merges I tried failed above 8k context. It appears that without further adjustments, they can not reaching full context.

altomek changed discussion status to closed
Cognitive Computations org
edited Feb 22

Yes, the config.json specifies a context length of 32k. However, all MoE merges I tried failed above 8k context. It appears that without further adjustments, they can not reaching full context.

It depends on the available memory and how you are running the inference. To get 8 or 16K context, will take an enormous amount of memory to handle the context. This relates to the inefficiencies of how each batch or chunk needs to setup the whole inference pipeline. So, as you increase the context window, the complexity and memory requirements scale at a much greater rate.

Sign up or log in to comment