Model training schedule and parameters
#3
by
AntonioMartini
- opened
Hello,
Really nice model! is there any information on the employed training schedule and any additional training parameters that would help to replicate the results?
Thanks,
Antonio
Hi... thanks!
Here are the hyperparameters:
lr = 5e-4
lr_schedule = constant
wd=0.1
adam_beta1=0.9, adam_beta2 = 0.95
context length=512
batch size=80
gradient accumulation steps=16
I think that's about it...
that's very helpful thanks!
This comment has been hidden
AntonioMartini
changed discussion status to
closed