Edit model card

Model Card: Nous-Yarn-Mistral-7b-128k

Preprint (arXiv)
GitHub yarn

Model Description

Nous-Yarn-Mistral-7b-128k is a state-of-the-art language model for long context, further pretrained on long context data for 1500 steps using the YaRN extension method. It is an extension of Mistral-7B-v0.1 and supports a 128k token context window.

To use, pass trust_remote_code=True when loading the model, for example

model = AutoModelForCausalLM.from_pretrained("NousResearch/Yarn-Mistral-7b-128k",
  use_flash_attention_2=True,
  torch_dtype=torch.bfloat16,
  device_map="auto",
  trust_remote_code=True)

In addition you will need to use the latest version of transformers (until 4.35 comes out)

pip install git+https://github.com/huggingface/transformers

Benchmarks

Long context benchmarks:

Model Context Window 8k PPL 16k PPL 32k PPL 64k PPL 128k PPL
Mistral-7B-v0.1 8k 2.96 - - - -
Yarn-Mistral-7b-64k 64k 3.04 2.65 2.44 2.20 -
Yarn-Mistral-7b-128k 128k 3.08 2.68 2.47 2.24 2.19

Short context benchmarks showing that quality degradation is minimal:

Model Context Window ARC-c Hellaswag MMLU Truthful QA
Mistral-7B-v0.1 8k 59.98 83.31 64.16 42.15
Yarn-Mistral-7b-64k 64k 59.38 81.21 61.32 42.50
Yarn-Mistral-7b-128k 128k 58.87 80.58 60.64 42.46

Collaborators

The authors would like to thank LAION AI for their support of compute for this model. It was trained on the JUWELS supercomputer.

Downloads last month
34,843

Dataset used to train NousResearch/Yarn-Mistral-7b-128k

Spaces using NousResearch/Yarn-Mistral-7b-128k 25

Collection including NousResearch/Yarn-Mistral-7b-128k