The training time mentioned in the paper and the explanations in the Git repository have a significant gap.

#77
by wangzl - opened

I summarized the training throughput of according to model card as follows:

model training size time cost a100 cost Throughput
phi1 54B 6 days 8 13020 token/s / per a100
phi1.5 150B 8 days 32 6781 token/s / per a100
phi2 1.4T 14 days 96 12056 token/s per a100

and in the paper of 1.5, 150B tokens training cost 1.5K A100 gpu hours, which means the throughput is 27777 token/s / per a100.
image.png

Through comparison, I feel there might be errors in the data presented in the paper. It could be due to other reasons as well. I welcome any input from everyone.

i am facing The repository for microsoft/phi-1_5 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/microsoft/phi-1_5. Please pass the argument trust_remote_code=True to allow custom code to be run. this error in my fine tuned model anyone suggest solution.

Sign up or log in to comment