What's the context window for this model?

#73
by siddheshgunjal - opened

What's the context window for this model?

Hi @siddheshgunjal
The model uses classic attention and GQA for the 2B model, therefore the model should attend to the entire sequence if I am not mistaken. Let me know @suryabhupa if this is correct :)

Google org

hello! Surya from the Gemma team here -- the 2B model actually uses MQA (just 1 KV head), whereas the 7B uses MHA. Both models have the same sequence/context length of 8192, as specified in the technical report.

Thanks @suryabhupa
I was also struggling to find this. I noticed that this isn't documented in the codegemma-7b technical documents. That number is only documented in the base model docs which is here: https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf
It would be nice if this was also linked on this page.
By the way thank you for making this model available it feel like it will be very helpful for me :)

Google org

It's our pleasure! Do you mean adding the context length to the Gemma's HF page docs?

Ideally the context and link to the technical doc on the model this thread is on gemma-7b-it.

Thanks @suryabhupa & @ybelkada for clarification!

siddheshgunjal changed discussion status to closed

Sign up or log in to comment