Edit model card
Hyperparameter Value
Steps 150k
Max length 256
LR 1e-4
LR schedule constant
Optimizer AdamW
beta_1, beta_2 0.9, 0.95
Final eval loss 2.245
Final eval perplexity 9.44
Downloads last month
0
Unable to determine this model's library. Check the docs .

Dataset used to train bri25yu/t5like-60M