CLEX: Continuous Length Extrapolation for Large Language Models

This repo stores the checkpoint of CLEX-7B-16K

Features and Highlights of CLEX

CLEX_diagram

  • Simple and Clear: MINIMAL code and architecture changes. Only one up-and-down projection layer introduced, NO recurrent memory caching or sparse attention required.
  • Train Short, Test Long: NO performance drop on the sequences 4x~8x longer than the training ones (see here).
  • Continuous Length Extrapolation: Explicitly modeling the continuous dynamics of context window size during length extrapolation.

More details about long-text modeling with our CLEX can be found at the git repo.

Model Zoo

Model Name Model Type Starting Point Train Data Train Length MAX Test Length
CLEX-7B-4K base LLaMA-2-7B Redpajama-Book 4K 16K
CLEX-7B-Chat-4K chat CLEX-7B-4K UltraChat 4K 16K
CLEX-7B-16K (this checkpoint) base LLaMA-2-7B Redpajama-Book 16K 64K
CLEX-7B-Chat-16K chat CLEX-7B-16K UltraChat 16K 64K

How to Use

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("DAMO-NLP-SG/CLEX-7B-16K", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("DAMO-NLP-SG/CLEX-7B-16K", torch_dtype=torch.bfloat16, trust_remote_code=True)
inputs = tokenizer("What is CLEX?", return_tensors="pt")
sample = model.generate(**inputs, max_length=128)
print(tokenizer.decode(sample[0]))

Citation

If you find our project useful, hope you can star our repo and cite our paper as follows:

@article{damonlpsg2023clex,
  author = {Chen, Guanzheng and Li, Xin and Meng, Zaiqiao and Liang, Shangsong and Bing, Lidong},
  title = {CLEX: Continuous Length Extrapolation for Large Language Models},
  year = 2023,
  journal = {arXiv preprint arXiv:2310.16450},
  url = {https://arxiv.org/abs/2310.16450}
}

Downloads last month
26
Safetensors
Model size
6.74B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.