args = (
wandb=False
prompt_type='chat'
data_path='instruct.zh.jsonl'
model_path='llama_7b'
micro_batch=3
total_batch=32
log_steps=100
eval_steps=0
save_steps=200
warmup_ratio=0.01
test_size=0
resume_from_checkpoint=None
ignore_data_skip=False
)

 >>> trainable params: 19988480 || all params: 6758404096 || trainable%: 0.2957573965106688
 ***** Running training *****
   Num examples = 51,584
   Num Epochs = 3
   Instantaneous batch size per device = 3
   Total train batch size (w. parallel, distributed & accumulation) = 30
   Gradient Accumulation steps = 10
   Total optimization steps = 4,836
   Number of trainable parameters = 19,988,480
4836/4836 [41:14:59<00:00, 30.71s/it]
{'train_runtime': 148499.2028, 'train_samples_per_second': 0.977, 'train_steps_per_second': 0.033, 'train_loss': 0.7752850797671341, 'epoch': 2.81}