What is the context length of this model?
subj
32k as guff shows
hello @mirek190 where did you find this information about 32K? I was also looking for mistral model with longer sequence length and found that it was actually trained on 8K token also on this specific model card it has mentioned 7B-8K. However mistral model has been implemented using sliding window approach due to which it considers tokens outside of the window as well while predicting next word but i was not able to find any evaluation or how to use it for 16k or 32K tokens
I'm using llamacpp where models are using newest binary implementation called gguf.
As gguf has baked in model parameters during start you can check what parameters are loaded .
So after loaded the model I see ctx 32k.
It was trained with an 8192 token context window: https://huggingface.co/mistralai/Mistral-7B-v0.1/discussions/4