MLM Loss
#48
by
ViacheslavBG
- opened
Good day
I've noticed that BERT's MLM (bert-base-uncased) loss is approximately 2.5 on wikipedia dataset on which it was trained. However, the original paper reported ~4 perplexity, i.e. loss ~1.38.
I continue learning it using run_mlm.py script. MLM loss decreased to 1.8 for 10000 steps.
May anybody explain why this checkpoint has a such big loss?