What is the context length of this model?

by denrykhlov - opened Oct 3, 2023

Discussion

denrykhlov

Oct 3, 2023

subj

mirek190

Oct 3, 2023

32k as guff shows

hiiamsid

Oct 4, 2023

hello @mirek190 where did you find this information about 32K? I was also looking for mistral model with longer sequence length and found that it was actually trained on 8K token also on this specific model card it has mentioned 7B-8K. However mistral model has been implemented using sliding window approach due to which it considers tokens outside of the window as well while predicting next word but i was not able to find any evaluation or how to use it for 16k or 32K tokens

mirek190

Oct 4, 2023

I'm using llamacpp where models are using newest binary implementation called gguf.
As gguf has baked in model parameters during start you can check what parameters are loaded .
So after loaded the model I see ctx 32k.

bleysg

OpenOrca org Oct 5, 2023

It was trained with an 8192 token context window: https://huggingface.co/mistralai/Mistral-7B-v0.1/discussions/4

bleysg changed discussion status to closed Oct 5, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment