Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame
145 Bytes
So when this training was done under fp16 mixed precision the very
last step overflowed (since under fp16 the largest number before inf is 64e3).