No description provided.

Trained for 62 epochs (~1 pass over all the data) of raccoon-v5.9-rlhf-full_text_document, with
--vocab_size 50281
--ctx_len 8192
--chatml_mask 1
--epoch_steps 50
--epoch_count 200
--epoch_begin 0
--epoch_save 2
--micro_bsz 3
--n_layer 32
--n_embd 4096
--pre_ffn 0
--head_qk 0
--lr_init 1e-4
--lr_final 3e-5
--warmup_steps 0
--beta1 0.9
--beta2 0.999
--adam_eps 1e-8
--accelerator gpu
--devices 8
--precision bf16
--strategy deepspeed_stage_2_offload
--grad_cp 1
--lora
--lora_r 128
--lora_alpha 16
--lora_dropout 0.001
--lora_parts att,ffn,time,ln

jerobich changed pull request status to open
m8than changed pull request status to merged

Sign up or log in to comment