13b的context len多大以及batch?

#1
by lucasjin - opened

我设置batch1 512 才能不OOM,但是这个context len太短了

本模型仓库的模型是在V100 32G上训练的。 context length为1024。单卡batch size 为4, 每四步进行一次梯度累计,一共八卡。所以训练的时候batch size 为128.

The model in this repostory was trained on 8 * V100 32G. The trainer configuration of model max_seq_length is 1024. During training, it loads 4 samples per graphic card, and do gradient accumulation every 4 steps. Therefore, the total batch size is 128.

V100 上能 跑batchsize 4 1024?

pretrain的时候长度是多少啊

在V100 32G 上,LoRA rank为8的前提下进行LoRA 训练,的确可以做到单卡4 batch size,seq length 长度1024。这需要借助deepspeed zero2 + offload。
该项目并未进行任何PT操作,均是在Llama2 模型上直接进行SFT训练

此外注意Peft 不能搭配zero3。

On V100 32G GPUs, with LoRA rank set to 8, LoRA training can indeed achieve a batch size of 4 and sequence length of 1024 using deepspeed zero2 + offload techniques. It's essential to note that this project does not involve any PT phase. All SFT training is conducted directly on the Llama2 model.

Additionally, please be aware that Peft (Prescaled Efficient Training) is not compatible with Deepspeed zero3 strategy

RicardoLee changed discussion status to closed
RicardoLee changed discussion status to open

very 感谢

RicardoLee changed discussion status to closed

Sign up or log in to comment