finetuning args

by lvkaokao - opened Oct 16, 2023

Oct 16, 2023

hi, I have seen the training args in the files. And have a question about the hyper-parameters:
the training dataset is ~300k, epoch=4, per_device_train_batch=6, and gradient accumulation=4, GPU cards=4, but the global steps=1204, is the hyper-parameters correct?

I am trying to reproduce your results, but I find the metric of ARC and Hellaswag decreases significantly during the training.

Hope to get your reply~
Thanks

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment