File size: 3,615 Bytes
1003c5f fd4edb5 e276a9a 1003c5f fd4edb5 1003c5f fd4edb5 1003c5f fd4edb5 1003c5f fd4edb5 1003c5f fd4edb5 1003c5f fd4edb5 1003c5f fd4edb5 1003c5f fd4edb5 1003c5f fd4edb5 1003c5f fd4edb5 1003c5f fd4edb5 1c34cdb fd4edb5 1003c5f fd4edb5 1003c5f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
license: llama3
datasets:
- princeton-nlp/prolong-data-64K
- princeton-nlp/prolong-data-512K
- HuggingFaceH4/ultrachat_200k
base_model:
- princeton-nlp/Llama-3-8B-ProLong-512k-Base
library_name: transformers
---
# princeton_nlp/Llama-3-8B-ProLong-512k-Instruct
[[Paper](https://arxiv.org/pdf/2410.02660)] [[HF Collection](https://huggingface.co/collections/princeton-nlp/prolong-66c72d55d2051a86ac7bd7e4)] [[Code](https://github.com/princeton-nlp/ProLong)]
**ProLong** (<u>Pr</u>incet<u>o</u>n <u>long</u>-context language models) is a family of long-context models that are continued trained and supervised fine-tuned from Llama-3-8B, with a maximum context window of 512K tokens. Our [main ProLong model](https://huggingface.co/princeton-nlp/Llama-3-8B-ProLong-512k-Instruct) is one of the best-performing long-context models at the 10B scale (evaluated by [HELMET](https://github.com/princeton-nlp/helmet)).
To train this strong long-context model, we conduct thorough ablations on the long-context pre-training data, SFT data, and numerous other design choices. We demonstrate our findings in our paper, [How to Train Long-Context Language Models (Effectively)](https://arxiv.org/pdf/2410.02660).
Authors: [Tianyu Gao](https://gaotianyu.xyz/about)\*, [Alexander Wettig](https://www.cs.princeton.edu/~awettig/)\*, [Howard Yen](https://howard-yen.github.io/), [Danqi Chen](https://www.cs.princeton.edu/~danqic/) (* equal contribution)
Contact: `{tianyug, awettig}@princeton.edu`
## The ProLong Models
- [princeton_nlp/Llama-3-8B-ProLong-64k-Base](https://huggingface.co/princeton-nlp/Llama-3-8B-ProLong-64k-Base)
- [princeton_nlp/Llama-3-8B-ProLong-64k-Instruct](https://huggingface.co/princeton-nlp/Llama-3-8B-ProLong-64k-Instruct)
- [princeton_nlp/Llama-3-8B-ProLong-512k-Base](https://huggingface.co/princeton-nlp/Llama-3-8B-ProLong-512k-Base)
- ⭐ [princeton_nlp/Llama-3-8B-ProLong-512k-Instruct](https://huggingface.co/princeton-nlp/Llama-3-8B-ProLong-512k-Instruct) ← you are here!
## Model card
Here are some quick facts about our main ProLong model: [princeton-nlp/Llama-3-8B-ProLong-512k-Instruct](https://huggingface.co/princeton-nlp/Llama-3-8B-ProLong-512k-Instruct).
* Base model: [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
* Long-context continued training: 20B tokens on 64K training data ([princeton-nlp/prolong-data-64K](https://huggingface.co/datasets/princeton-nlp/prolong-data-64K)), and 20B tokens on 512K training data ([princeton-nlp/prolong-data-512K](https://huggingface.co/datasets/princeton-nlp/prolong-data-512K))
* Supervised fine-tuning (SFT): [UltraChat](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)
* Maximum context window: 512K tokens
<p align="center" style="margin-bottom: 0;">
<img width="80%" alt="image" src="https://github.com/user-attachments/assets/c31c9671-49fe-4776-91d2-de70ffd9f9a1">
</p>
<p align="center" style="margin-top: 0; padding-top: 0;">
<em>ProLong performance on <a href="https://github.com/princeton-nlp/helmet">HELMET</a> averaged over 32K, 64K, and 128K lengths. All models are instruct models.</em>
</p>
<p align="center">
<img width="80%" alt="image" src="https://github.com/user-attachments/assets/a36a7d0f-4480-4a29-80f3-208477707fb7">
</p>
<p align="center" style="margin-top: 0;">
<em>ProLong training recipe.</em>
</p>
## Citation
```bibtex
@article{gao2024prolong,
title={Enabling Large Language Models to Generate Text with Citations},
author={Gao, Tianyu and Wettig, Alexander and Yen, Howard and Chen, Danqi},
year={2024},
}
``` |