What is the global attention span and siding window ?
#23
by
ducknificient
- opened
Gemma 2 paper mention the global attention span is 8192 and sliding window 4096
but what about Gemma (first generation) ?
best regards,
Hi @ducknificient ,
There is no specific mention of global attention span or sliding window in the Gemma 1 model. The official Gemma paper refers to this concept as a context length, which is set at 8192 tokens. You can find more details in the paper here.
Thank you.
thanks
ducknificient
changed discussion status to
closed