Transformers
Safetensors
Inference Endpoints

Context windows is only 8k???

#1
by rombodawg - opened

Umm did you forget something?

Screenshot (696).png

This is from the original Gemma Config *kwargs, the segment_size is what scales attention. It splits the attention into recurrent segment_size chunks per https://arxiv.org/abs/2404.07143 πŸ™‚

Sign up or log in to comment