Training Parameters

#13
by maveriq - opened

Hi again. Since your work is probably going to encourage quite a few people to train their own LMs from scratch (I know I am going to), can you share the training hyperparameters, so that we can do a fair comparison with your model and results? Specifically I am looking for information on :

  • Optimizer and it's parameters ( e.g. betas and eps in case of Adam)
  • Learning rate schedulers and it's parameters (e.g. type of scheduler and warm pct, decay shape etc.)
  • Batch size, learning rate
  • num steps/tokens seen
  • any optimizations e.g. fairscale, deepspeed etc.?

Thank you.

maveriq changed discussion status to closed

Sign up or log in to comment