What is the maximum length of Mistral-7B-Instruct-v0.2?

#37
by xcjthu - opened

According to the config.json file, the base model Mistral-7B-v0.1 and its corresponding instruction tuning version, Mistral-7B-Instruct-v0.1, have a maximum length of 32k. However, the report for Mistral-7B indicates that these models are trained within an 8k context window. So, what is the maximum length these models can handle?

Additionally, the config.json file reveals that the RoPE base for Mistral-7B-Instruct-v0.2 has changed from 10000.0 to 1000000.0. Does this mean that the model was fine-tuned after the NTW-Aware positional encoding transformation?

Sign up or log in to comment