Snowflake
/

snowflake-arctic-instruct

Text Generation

Mixture of Experts

Model card Files Files and versions

context length

#5

by pateco - opened Apr 24, 2024

pateco

Apr 24, 2024

•

edited Apr 24, 2024

What is the context length of this model, and what tokenizer does it use?

pateco

Apr 24, 2024

•

edited Apr 24, 2024

Arctic is trained using a 4K attention context window. We are developing an attention-sinks-based sliding window implementation to support unlimited sequence generation capability in the coming weeks. We look forward to working with the community to extend to a 32K attention window in the near future.

https://www.snowflake.com/blog/arctic-open-efficient-foundation-language-models-snowflake/

when counting tokens with cl100k_base I can only get to ~3k when testing in the example app, https://arctic.streamlit.app/

Apr 24, 2024

•

edited Apr 24, 2024

The model is trained with 4k dense attention and the tokenizer is adopted from LLaMa-2.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment