What is the context length of this model?

#7
by mediguru - opened

What is the context length of this model? is it same as BioMistral (2048)?

Yes and no. The original model is one of the models used to make this MoE, and other models have larger context window (32k like Mistral).

That said, the authors of BioMistral used 2048 length for grouping in training process, so it is safe to go that high to get the best accuracy. But I don't believe it will stop you to go higher unlike Llama-2 since the Mistral-7B (the base model) has a larger context window. (specially not in this model, I think you can go higher easily)

image.png

Sign up or log in to comment