BlinkDL/rwkv-4-raven · What is the correct parameters to finetune the 7B rwkv model

May 18, 2023

Hello I try to finetune the 7B rwkv model with LoRA, I have trained the 1.5B model successfully following the parameter you shared in train.py script. But for 7B model, I changed the layer to 32 and embedding dim to 4096, but I still cant load model correctly, just told me its a bad checkpoint. I just want to know the correct training parameters for 7B model. Any response is appreciated. This is my training cmd below:

python ${FILE_DIR}/train.py --load_model ${FILE_DIR}/rwkv_model/RWKV-4-Raven-7B-v11.pth --wandb "" --proj_dir "out"
--data_file ${FILE_DIR}/train.npy --data_type "numpy" --vocab_size 50277
--ctx_len 1024 --epoch_steps 100 --epoch_count 5 --epoch_begin 0 --epoch_save 10
--micro_bsz 4 --n_layer 32 --n_embd 4096 --pre_ffn 0 --head_qk 0
--lr_init 1e-5 --lr_final 1e-5 --warmup_steps 0 --beta1 0.9 --beta2 0.999 --adam_eps 1e-8
--precision bf16 --strategy deepspeed_stage_2_offload --accelerator gpu --grad_cp 0 --devices 1
--lora --lora_r 8 --lora_alpha 16 --lora_dropout 0.05
--lora_parts=att,ffn,time,ln # configure which parts to fine-tune
md5 of my model: a33cd94e8c6dfadde51887c796c729b0

fubincom changed discussion status to closed May 18, 2023

BlinkDL

Owner May 18, 2023

use --grad_cp 1 to save VRAM
what's the error

fubincom

May 18, 2023

use --grad_cp 1 to save VRAM
what's the error

hello, sorry I just download the model again and save it to cache dir, then I have solved the problems. But I find if I want to fine-tune the 7B model with LoRA, at least I need to have 42G CPU memory to load the model otherwise my training process will be killed. is it normal.