Edit model card

Model Card: Yarn-Solar-10b-64k

Preprint (arXiv)
GitHub yarn

Model Description

Yarn-Solar-10b-64k is a state-of-the-art language model for long context, further pretrained on two billion long context tokens using the YaRN extension method. It is an extension of SOLAR-10.7B-v1.0 and supports a 64k token context window.

To use, pass trust_remote_code=True when loading the model, for example

model = AutoModelForCausalLM.from_pretrained("NousResearch/Yarn-Solar-10b-64k",
  attn_implementation="flash_attention_2",
  torch_dtype=torch.bfloat16,
  device_map="auto",
  trust_remote_code=True)

In addition you will need to use the latest version of transformers

pip install git+https://github.com/huggingface/transformers

Benchmarks

Long context benchmarks:

Model Context Window 4k PPL 8k PPL 16k PPL 32k PPL 64k PPL
Mistral-7B-v0.1 8k 3.09 2.96 - - -
Yarn-Mistral-7b-64k 64k 3.18 3.04 2.65 2.44 2.20
Yarn-Mistral-7b-128k 128k 3.21 3.08 2.68 2.47 2.24
SOLAR-10.7B-v1.0 4k 3.07 - - - -
Yarn-Solar-10b-32k 32k 3.09 2.95 2.57 2.31 -
Yarn-Solar-10b-64k 64k 3.13 2.99 2.61 2.34 2.15

Short context benchmarks showing that quality degradation is minimal:

Model Context Window ARC-c Hellaswag MMLU Truthful QA
Mistral-7B-v0.1 8k 59.98 83.31 64.16 42.15
Yarn-Mistral-7b-64k 64k 59.38 81.21 61.32 42.50
Yarn-Mistral-7b-128k 128k 58.87 80.58 60.64 42.46
SOLAR-10.7B-v1.0 4k 61.95 84.60 65.48 45.04
Yarn-Solar-10b-32k 32k 59.64 83.65 64.36 44.82
Yarn-Solar-10b-64k 64k 59.21 83.08 63.57 45.70

Collaborators

The authors would like to thank LAION AI for their support of compute for this model. It was trained on the JUWELS supercomputer.

Downloads last month
710

Dataset used to train NousResearch/Yarn-Solar-10b-64k

Collection including NousResearch/Yarn-Solar-10b-64k