finetuning args

#8
by lvkaokao - opened

hi, I have seen the training args in the files. And have a question about the hyper-parameters:
the training dataset is ~300k, epoch=4, per_device_train_batch=6, and gradient accumulation=4, GPU cards=4, but the global steps=1204, is the hyper-parameters correct?

I am trying to reproduce your results, but I find the metric of ARC and Hellaswag decreases significantly during the training.

Hope to get your reply~
Thanks

Sign up or log in to comment