m8than/rwkv-v4-raccoon

May 17, 2023

No description provided.

finetuned with LoRA r=128, alpha 16, lr 1e-4 to 3e-5, microbsz 3, 8 GPUs for 62 epochs with raccoon-v5.9-rlhf-full_text_document899881ce

jerobich

May 17, 2023

Trained for 62 epochs (~1 pass over all the data) of raccoon-v5.9-rlhf-full_text_document, with
--vocab_size 50281
--ctx_len 8192
--chatml_mask 1
--epoch_steps 50
--epoch_count 200
--epoch_begin 0
--epoch_save 2
--micro_bsz 3
--n_layer 32
--n_embd 4096
--pre_ffn 0
--head_qk 0
--lr_init 1e-4
--lr_final 3e-5
--warmup_steps 0
--beta1 0.9
--beta2 0.999
--adam_eps 1e-8
--accelerator gpu
--devices 8
--precision bf16
--strategy deepspeed_stage_2_offload
--grad_cp 1
--lora
--lora_r 128
--lora_alpha 16
--lora_dropout 0.001
--lora_parts att,ffn,time,ln

jerobich changed pull request status to open May 17, 2023

m8than changed pull request status to merged May 17, 2023

m8than
/

rwkv-v4-raccoon

7B-v5.9