Reproduce pre-training results

#2
by kevalmorabia97 - opened

I would like to reproduce the bert-large-uncased-whole-word-masking model provided by huggingface. Could you please share more details on the experimental setup?

  • Was this model trained from scratch or a fine-tuned version of bert-large-uncased but with wwm?
  • How many epochs / steps, learning rate, batch size, number of gpus, etc.?
  • There is this reference script but the example command uses wikitext dataset while bert was pre-trained on book corpus and english Wikipedia so I'm not sure how to reproduce these results.

Thank you :)

Any follow-up would be greatly appreciated!

Sign up or log in to comment