Edit model card

license: apache-2.0

Paper: Adapting Language Models to Compress Contexts

Code: https://github.com/princeton-nlp/AutoCompressors

Models:


FullAttention-Llama-2-7b-6k is a model fine-tuned from meta-llama/Llama-2-7b-hf and used as baseline in Adapting Language Models to Compress Contexts. This model is fine-tuned on 15B tokens from RedPajama dataset. The pre-trained Llama-2 model is fine-tuned on sequences of 6,144 tokens with a RoPE θ value of 80,000.

To get started, load this model as a LlamaForCausalLM model, or download the AutoCompressor repository and load the model as follows:

from auto_compressor_llama import LlamaAutoCompressorModel

model = LlamaAutoCompressorModel.from_pretrained("princeton-nlp/FullAttention-Llama-2-7b-6k")

Evaluation

We record the perplexity achieved by our Llama-2-7B models on segments of 2048 tokens, conditioned on different amounts of context. FullAttention-Llama-2-7b-6k uses full uncompressed contexts whereas AutoCompressor-Llama-2-7b-6k compresses segments of 2048 tokens into 50 summary vectors.

Context Tokens 0 512 2048 4096 6144
Pre-trained Llama-2-7b 5.52 5.15 4.98 - -
FullAttention-Llama-2-7b-6k 5.40 5.06 4.88 4.80 4.76
AutoCompressor-Llama-2-7b-6k 5.40 5.16 5.11 5.08 5.07

Bibtex

@misc{chevalier2023adapting,
      title={Adapting Language Models to Compress Contexts}, 
      author={Alexis Chevalier and Alexander Wettig and Anirudh Ajith and Danqi Chen},
      year={2023},
      eprint={2305.14788},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
3
Inference API
Unable to determine this model’s pipeline type. Check the docs .