MS-LongWriter-Qwen2.5-7B-Instruct
🤖 [LongWriter Dataset] • 💻 [Github Repo] • 📃 [LongWriter Paper] • 📃 [Tech Report]
MS-LongWriter-Qwen2.5-7B-Instruct is trained based on https://modelscope.cn/models/qwen/Qwen2.5-7B-Instruct, and is capable of generating 10,000+ words at once.
MS-LongWriter-Qwen2.5-7B-Instruct begins training directly from the Qwen2.5-7B-Instruct, while performing significant distillation on the LongWriter-6k to obtain 666 high-quality samples, which is LongWriter-6k-filtered
Datasets
- LongWriter-6k-filtered, based on the LongWriter-6k
- Magpie-Qwen2-Pro-200K-Chinese , random sampling 6k examples.
- Magpie-Qwen2-Pro-200K-English , random sampling 6k examples.
Model
We use ms-swift to fine-tune the Qwen2-7B-Instruct model.
- Installation
pip install ms-swift[llm]
- Fine-tuning
Envs:
Nvidia A100(80G) x 4
Run:
CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
--model_type qwen2_5-7b-instruct \
--dataset longwriter-6k-filtered#666 qwen2-pro-zh#6660 qwen2-pro-en#6660 \
--max_length 28672 \
--num_train_epochs 2 \
--eval_steps 200 \
--batch_size 1 \
--gradient_accumulation_steps 64 \
--gradient_checkpointing true \
--warmup_ratio 0.1 \
--learning_rate 1e-5 \
--sft_type full \
--loss_name long-ce \
--check_dataset_strategy warning \
--save_only_model false \
--save_total_limit -1 \
--lazy_tokenize true \
--dataloader_num_workers 1 \
--resume_only_model true \
--neftune_noise_alpha 5 \
--use_flash_attn true
- Fine-tuning with annealing
The annealing strategy is used to improve the performance of the model during the post-training process. We leverage the LongWriter-6k-filtered dataset to fine-tune the model with annealing, and set the learning rate to 2e-6. Run:
CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
--model_type qwen2_5-7b-instruct \
--dataset longwriter-6k-filtered#666 \
--max_length 28672 \
--num_train_epochs 2 \
--eval_steps 200 \
--batch_size 1 \
--gradient_accumulation_steps 64 \
--gradient_checkpointing true \
--warmup_ratio 0.1 \
--learning_rate 2e-6 \
--sft_type full \
--loss_name long-ce \
--check_dataset_strategy warning \
--save_only_model false \
--save_total_limit -1 \
--lazy_tokenize true \
--dataloader_num_workers 1 \
--resume_only_model true \
--neftune_noise_alpha 5 \
--use_flash_attn true \
--resume_from_checkpoint {previous-checkpoint-path}
Note:
- The
--resume_from_checkpoint
parameter is used to specify the path of the previous checkpoint. (see the step2)
Evaluation
Refer to LongWriter Evaluation from the EvalScope.
Reference
If you find our work helpful, please consider citing our paper, and star our github repositories.
@misc{chen2024minimumtuningunlocklong,
title={Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key},
author={Yingda Chen and Xingjun Wang and Jintao Huang and Yunlin Mao and Daoze Zhang and Yuze Zhao},
year={2024},
eprint={2410.10210},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.10210},
}
- Downloads last month
- 851