What is the context length of this model?

#4
by denrykhlov - opened

32k as guff shows

hello @mirek190 where did you find this information about 32K? I was also looking for mistral model with longer sequence length and found that it was actually trained on 8K token also on this specific model card it has mentioned 7B-8K. However mistral model has been implemented using sliding window approach due to which it considers tokens outside of the window as well while predicting next word but i was not able to find any evaluation or how to use it for 16k or 32K tokens

I'm using llamacpp where models are using newest binary implementation called gguf.
As gguf has baked in model parameters during start you can check what parameters are loaded .
So after loaded the model I see ctx 32k.

OpenOrca org

It was trained with an 8192 token context window: https://huggingface.co/mistralai/Mistral-7B-v0.1/discussions/4

bleysg changed discussion status to closed

Sign up or log in to comment