What is the context length of this model?

by mediguru - opened Mar 13, 2024

Discussion

mediguru

Mar 13, 2024

What is the context length of this model? is it same as BioMistral (2048)?

MaziyarPanahi

Owner Mar 13, 2024

Yes and no. The original model is one of the models used to make this MoE, and other models have larger context window (32k like Mistral).

That said, the authors of BioMistral used 2048 length for grouping in training process, so it is safe to go that high to get the best accuracy. But I don't believe it will stop you to go higher unlike Llama-2 since the Mistral-7B (the base model) has a larger context window. (specially not in this model, I think you can go higher easily)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment