What's the context window for this model?

#73

by siddheshgunjal - opened Mar 5, 2024

Discussion

siddheshgunjal

Mar 5, 2024

What's the context window for this model?

ybelkada

Mar 5, 2024

Hi @siddheshgunjal
The model uses classic attention and GQA for the 2B model, therefore the model should attend to the entire sequence if I am not mistaken. Let me know @suryabhupa if this is correct :)

suryabhupa

Google org Mar 7, 2024

hello! Surya from the Gemma team here -- the 2B model actually uses MQA (just 1 KV head), whereas the 7B uses MHA. Both models have the same sequence/context length of 8192, as specified in the technical report.

KrisTC

Apr 16, 2024

Thanks @suryabhupa
I was also struggling to find this. I noticed that this isn't documented in the codegemma-7b technical documents. That number is only documented in the base model docs which is here: https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf
It would be nice if this was also linked on this page.
By the way thank you for making this model available it feel like it will be very helpful for me :)

suryabhupa

Google org Apr 16, 2024

It's our pleasure! Do you mean adding the context length to the Gemma's HF page docs?

KrisTC

Apr 16, 2024

Ideally the context and link to the technical doc on the model this thread is on gemma-7b-it.

siddheshgunjal

Apr 24, 2024

Thanks @suryabhupa & @ybelkada for clarification!

siddheshgunjal changed discussion status to closed Apr 24, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment