t5-small-wikitext / README.md
tomhodemon's picture
Update README.md
e2f37eb

t5-small-wikitext

t5-small trained on wikitext/wikitest-103-raw-v1 over 50k steps (around 2 hours of training) following T5 paper training procedure.

  • batch_size: 32
  • max_seq_length: 128
  • optim: Adafactor
  • sheduler: inverse square root (10k warm-up steps)