context length

#5
by pateco - opened

What is the context length of this model, and what tokenizer does it use?

Arctic is trained using a 4K attention context window. We are developing an attention-sinks-based sliding window implementation to support unlimited sequence generation capability in the coming weeks. We look forward to working with the community to extend to a 32K attention window in the near future.

https://www.snowflake.com/blog/arctic-open-efficient-foundation-language-models-snowflake/

when counting tokens with cl100k_base I can only get to ~3k when testing in the example app, https://arctic.streamlit.app/

The model is trained with 4k dense attention and the tokenizer is adopted from LLaMa-2.

Sign up or log in to comment