ProLong-512k-8B-CLIPPER

ProLong-512k-8B-CLIPPER is a fine-tuned version of princeton-nlp/Llama-3-8B-ProLong-512k-Instruct using supervised finetuning over chtmp223/CLIPPER dataset. Please check our paper for more details on the method.

πŸ“’ Model Details

Model Description

Model Sources

πŸ’» Training Details

Training Data

chtmp223/CLIPPER

Training Procedure

Configurations Values
Hardware (Training and Inference) 8xA100s
Tracking wandb
batch size 16
gradient_checkpointing True
learning_rate 1.0e-6
lr_scheduler_type cosine
max_length 131072
num_train_epochs 1
optim adamw_torch

Software

Training code is adapted from https://github.com/princeton-nlp/ProLong.

πŸ€— Inference

Inference is done with vLLM on 1 A100-80GB.

πŸ“œ Citation

@misc{pham2025clippercompressionenableslongcontext,
      title={CLIPPER: Compression enables long-context synthetic data generation}, 
      author={Chau Minh Pham and Yapei Chang and Mohit Iyyer},
      year={2025},
      eprint={2502.14854},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.14854}, 
}
Downloads last month
6
Safetensors
Model size
8.03B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for chtmp223/ProLong-512k-8B-CLIPPER

Dataset used to train chtmp223/ProLong-512k-8B-CLIPPER

Collection including chtmp223/ProLong-512k-8B-CLIPPER