MLM Loss

#48

by ViacheslavBG - opened Jul 23, 2023

Jul 23, 2023

Good day

I've noticed that BERT's MLM (bert-base-uncased) loss is approximately 2.5 on wikipedia dataset on which it was trained. However, the original paper reported ~4 perplexity, i.e. loss ~1.38.
I continue learning it using run_mlm.py script. MLM loss decreased to 1.8 for 10000 steps.
May anybody explain why this checkpoint has a such big loss?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment