Please reply to this
#1
by
aslawliet
- opened
What hyperparameters did you use, specially the learning rate, also what and how many TRC TPUs did you use for the training? Was it a full parameter training or lora training?
You can find learning rate in the README. FYI, 5e-5 and reduced during training until 1e-5.
I used TPUv4-64/256 for pretraining, and of course this is full param training.
beomi
changed discussion status to
closed