`max_position_embeddings=32768` with "attention span of 131K tokens"

#57
by Nadav-Timor - opened

Hi,
Can you please clarify how you use the max_position_embeddings hyperparameter? The config.json file specifies max_position_embeddings=32768 while the paper claims an attention span of 131K tokens (see Section 2 on "Architectural details" β†’ "Sliding Window Attention").
Thanks!

See this GitHub Issue by @ParadoxZW from a few days ago

Sign up or log in to comment