the good initial lr?

#1
by butyuhao - opened

What learning rate should I use at first to fine-tune bart-large?

Fudan NLP org

A good start is 1e-5 or 2e-5, with lr warm-up and decay. In the paper, we grid search the lr in [5e-6, 1e-5, 2e-5, 5e-5].

Sign up or log in to comment