MLM Loss

#48
by ViacheslavBG - opened

Good day

I've noticed that BERT's MLM (bert-base-uncased) loss is approximately 2.5 on wikipedia dataset on which it was trained. However, the original paper reported ~4 perplexity, i.e. loss ~1.38.
I continue learning it using run_mlm.py script. MLM loss decreased to 1.8 for 10000 steps.
May anybody explain why this checkpoint has a such big loss?

Sign up or log in to comment