mistralai/Mistral-7B-v0.1 · context window size

Sep 27, 2023

I can't find this in the blog post or readme but what is the context window available for these models? I might have missed it!

ZeroXClem

Sep 27, 2023

•

edited Sep 27, 2023

8K sequence length noted from their product page. https://mistral.ai/product/
For a more detailed specification read here: https://mistral.ai/news/announcing-mistral-7b/

lerela

Mistral AI_ org Oct 5, 2023

Thanks @ZeroXClem for the answer, and it's also in Transformers documentation: https://huggingface.co/docs/transformers/v4.34.0/en/model_doc/mistral#model-details

lerela changed discussion status to closed Oct 5, 2023

YokaiKoibito

Oct 5, 2023

•

edited Oct 5, 2023

Technically it's unlimited with a 4k sliding window context size.

Under the hood the stacked layers allow the possibility of indirectly attending to things more than 4k token previously, but that now requires multiple attention-hops, and there's no backward-in-token-space propagation of the attention-query to before the sliding window. So in effect, if something's outside the sliding window, the model can attend to it only if it previously attended to it at a token less than 4k tokens previously: so outside the sliding window the model can learn to have capabilities more like LSTM-level than attention.