args = ( wandb=False prompt_type='chat' data_path='instruct.zh.jsonl' model_path='llama_7b' micro_batch=3 total_batch=32 log_steps=100 eval_steps=0 save_steps=200 warmup_ratio=0.01 test_size=0 resume_from_checkpoint=None ignore_data_skip=False ) >>> trainable params: 19988480 || all params: 6758404096 || trainable%: 0.2957573965106688 ***** Running training ***** Num examples = 51,584 Num Epochs = 3 Instantaneous batch size per device = 3 Total train batch size (w. parallel, distributed & accumulation) = 30 Gradient Accumulation steps = 10 Total optimization steps = 4,836 Number of trainable parameters = 19,988,480 4836/4836 [41:14:59<00:00, 30.71s/it] {'train_runtime': 148499.2028, 'train_samples_per_second': 0.977, 'train_steps_per_second': 0.033, 'train_loss': 0.7752850797671341, 'epoch': 2.81}